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C/3 1 Abstract 

i^h | In this paper the asymptotic distribution of estimators is derived in a general regression 

setting where rank restrictions on a submatrix of the coefficient matrix are imposed and 
the regressors can include stationary or 1(1) processes. Such a setting occurs e.g. in 
factor models. Rates of convergence are derived and the asymptotic distribution is given 
for least squares estimators as well as fully-modihed estimators. The gains in imposing 
the rank restrictions are investigated. A number of special cases are discussed including 
| the Johansen results in the case of cointegrated VAR(p) processes. 
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1 Introduction 

In this paper a multivariable time series (ytjteZiVt £ I^ s > is modeled as a linear function of 
two processes (z![)tez>Zt £ ^ mr an d (zf)teZjZf ^ (where V stands for restricted and 'u' 
for unrestricted) using the following model: 

y t = b T z\ + b u z? + Ut,t = l,...,T (1) 

where b r = OV is of rank n < min(s,m r ). Such a situation can occur e.g. for panel data 
sets where both s and m r are large. Throughout all variables will be assumed to be either 
stationary or (co-)integrated. Details on the assumptions for the processes are given below. 
For the moment assume that [ut)tez is an independent identically distributed (iid) process. 

In this situation the asymptotics for the OLS estimators (Park and Phillips, 1988; Park and 
Phillips, 1989) and fully modified (Phillips, 1995) estimators neglecting the rank restriction 
are well documented in the literature. However, neglecting the rank restriction in the case 
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that m r and s are large, the number of parameters to be estimated equals (m r + m u )s which 
might require excessively large samples in order to allow for reasonable accuracy. As an 
alternative then rank restricted regression (RRR) can be used in order to reduce the number 
of parameters greatly. 

The RRR framework of equation (1) is also of importance for the estimation involved in 
subspace methods, (see e.g. Larimore, 1983; Bauer and Wagner, 2002). In these methods a 
RRR of the type (1) is the central step in the estimation. Thus the understanding of the 
asymptotic properties of the corresponding estimators needs a thorough understanding of the 
asymptotic properties of estimators for (1). 

If all involved processes are stationary the asymptotic theory of RRR estimators based on 
OLS is presented in (Reinsel and Velu, 1998). There consistency and asymptotic normality 
of the estimated coefficient matrices is stated for the (generic) special case that all singular 
values of b r are distinct. Further expressions for the asymptotic variance matrix are provided 
using implicitly defined quantities which, hence, are not easy to interpret or implement. 

For a cointegrated process X t letting 

yt = AX t = X t -X t - 1 , z\ = X t .! , z u t = [Ai;.!, . . . , AX t '_/ 

equation (1) corresponds to the Johansen framework (Johansen, 1995). Also in this case the 
asymptotics of quasi maximum likelihood estimators are well known. Although the original 
material focuses on the estimation of the cointegrating relations V extracted as the right factor 
in the product b r = OF', the asymptotics for the full estimator b r can be derived based on 
these results, see the evaluations in (Johansen, 1995). The arguments given there rely on 
stationarity of yt and F'X t -i as well as on the fact that the rank restriction only restricts the 
coefficients corresponding to the nonstationary components of X t -\ as will be demonstrated 
below. 

Equation (1) extends this framework by allowing for more general processes z[ and zf. 
It will be shown below (see Theorem 3.1) that there exist nonsingular transformations T y ,T r 
such that 



^CyX(c r —Cy) ^ Cy X (m r ~ C r ) 



Icy 

o o 2 



I Cy 

o o r^ 2 j 



_ Q(s-Cy)xc y Q(s-Cy)x(c r -Cy) "23,r 

> ^ '> v ' 

6 f' 

and in z[ = T r z"t the first c r coordinates are integrated, the remaining ones being stationary. 
In the Johansen framework c y = holds while in this paper < c y < n < s is allowed 



for. Also in T y (yt — b u zf) the first c y components are integrated the remaining ones being 
stationary. 

In this extended situation the asymptotics of (Johansen, 1995) do not apply as can be 
seen from the following arguments: Using the notation (at, bt) = T~ l J2t=i a tK for processes 
(at)t e Z) ipt)tei the consistency proof in Lemma 13.1. of (Johansen, 1995) relies on solving the 
generalized eigenvalue problem 

Here aj denotes the residuals of a regression onto zf. Consistency is shown by transforming 
the problem using the matrix At = [T±T~ 1 / 2 , f] (changing the order of the block column to 
correspond to our ordering as used below) where the columns of the matrix T±, (r_i_)'r_i_ = I 
span the orthogonal complement of the space spanned by the columns of F. Correspondingly 
for Cy = in A' T zl' n the first components are nonstationary but scaled by T -1 / 2 and the 
remaining ones stationary. In the transformed problem 

XA' T (zr, %*)Atw - A' T (~z^,y-)(y-, J" 1 ^, z^)A T w = 

all matrices converge to block diagonal matrices. Thus the corresponding eigenvalues and 
matrix of eigenvectors Vp (with a suitable choice of the basis) converge. The eigenvectors 
of the transformed problem corresponding to the nonzero eigenvalues are related via Vr = 
A^ 1 Wt = [f , T±T 1 / 2 ]'Wt (assuming without restriction of generality f'f = I n ) implying 
that T 1 / 2 Y' a Wt converges to zero in probability. No almost sure (a.s.) results and no sharper 
bounds on the order of convergence are provided in (Johansen, 1995). 

For c y > 0, however, Y'z[ is nonstationary and A' T (z^ ,7r , z t ^)At does not converge. Using 
instead At as 

" T- x l 2 l Cy ■ 

V _ T-V 2 I Cr ^ Cy 

T ~ o f' 32 

- o o r\ ± _ 

where r' 32 j_r32 = 0, r' 32 ^f^^ = I leads to convergence for the generalized eigenvalue prob- 
lem. Thus again converges, where the first c y columns corresponding to the 
eigenvalue A = 1 converge to [7 Cy ,0]'. Consequently also Wt = AtVt converges. However, 
the heading c y x c y subblock of this matrix equals T" l / 2 I Cy and hence converges to zero as 
does the whole block column. Multiplying the corresponding block column with T 1 / 2 the 



heading subblock equals the identity matrix as required, but the orders of convergence for 
the remaining blocks are reduced by this order and hence the remaining arguments in the 
proof of Lemma 13.1 of (Johansen, 1995) do no longer apply. Therefore this approach cannot 
be used in order to show consistency for the estimator of T and thus also not of O. Due 
to this complication (Bauer and Wagner, 2002) were led to provide an adapted estimator by 
setting the remaining block rows of the first block column of Vt equal to zero. In this paper 
a different route in the proof for consistency of the estimator for b r is provided showing that 
the adaptation is not needed. 

In addition to the changes in the consistency proofs also the derivation of the asymptotic 
distribution of the estimators O and T of O and T as provided in Lemma 13.2. of (Johansen, 
1995) for the case c y = cannot be used in the case c y > as can be seen from these 
arguments: In the last equation on p. 182 the last term (O — 0)r'(z[ ,7r , z r t ' n ) is shown to tend 
to zero using consistency for O and stationarity of T'zl' n for c y = 0. For c y > 0, however, 
F'zl'™ contains nonstationary components such that (T'z^ ,7r , z^ ,7T ) = Op(T) and moreover 
converges in distribution to a nondegenerate distribution when divided by T. Hence even 
if (O — O) is estimated superconsistently such that T{0 — O) converges in distribution, the 
term (O — 0)T'(zl' n , z^' n ) in the last equation on p. 182 does not vanish. Thus also for the 
asymptotic distribution the proof in (Johansen, 1995) does not apply for the case c y > and 
a more detailed analysis is needed. It is the main goal of the paper to close this gap in the 
literature. 

In this paper two different estimators are considered: RRR estimator based on the un- 
restricted OLS estimator as well as based on the fully modified unrestricted estimator of 
(Phillips, 1995). The main contributions of the paper are: 

• A full discussion of the asymptotic properties of the RRR estimators including condi- 
tions for consistency, derivation of the asymptotic distribution of the estimators under 
the condition of known rank n is provided. 

• For the RRR estimator based on OLS almost sure (a.s.) rates of convergence are 
provided, improving the results in the literature which provide only in probability con- 
vergence. 

• Furthermore in all cases the asymptotic distribution will be given explicitly and a de- 
tailed comparison of the relative advantages in a number of special cases is provided. 



The organization of this paper is the following: The next section presents the various esti- 
mation algorithms while their corresponding asymptotic properties are discussed in section 3. 
Section 4 illustrates the results using a number of special cases. Finally section 5 summarizes 
the paper. All results are proved in Appendix A. A summary of the notation is contained in 
Appendix B. 

2 Estimation Algorithms 

In this paper four different estimators for the coefficient matrices b r , b u in equation (1) based on 
observations for time instants t = 1, . . . , T are considered. Throughout as above the notation 
{at, b t ) := T _1 Ylt=i a tK wm be used (somewhat sloppily using at, h for the processes (at)tez 
and (b t )tez and for the variables at,bt for given time instant t respectively). 
Using this notation the ordinary least squares (OLS) estimator (that ignores the knowledge 
on the rank constraint rank(6 r ) = n) can be written as 

Pols = {yt,z t ){z t ,z t y 1 , Pols = [PoLS,r,PoLS,u\- 

If 

-+(£+)' := (y^y?)- 1 , ft = yt- (y t , z?)(z?, z^z?, 

the rank restricted estimator maximizing the quasi maximum likelihood based on the assump- 
tion of iid Gaussian residuals can be defined as 

T 
*=1 

and is given by 



Prrr = arg min tr 

p=\p r ,Pu] m s x (™r +m« ) ,rank(,3 r ) =71 



6 = {Z + )- l U n S n ,G' = = OG'J RRR , u = {y t -p RRR ^z r t ,z^){z^z^)- 1 (2) 

using the SVD 

^+PoLS,r^p = U n S n Vn + 

where U n denotes the matrix having as columns the singular vectors corresponding to 
the dominant singular values u\ > <5"2 > . . . > a n > contained as the diagonal in the 
diagonal matrix S n . The corresponding right singular vectors are contained in V n . Finally 
R n constitutes the approximation error. Here Ep = (zf , zj) 1 / 2 (where X 1 / 2 denotes the 



symmetric matrix square root of the square matrix X and zl denote residuals from regression 
of z\ onto zf). Clearly the estimator $rrr^ does not depend on the decomposition of OG' 
into O and G' . 

Note that for this choice of H + and Ep the columns of G' can also be interpreted as the 
eigenvectors to the generalized eigenvalue problem 

(z?,z?)GS 2 n = W,y?)(yhVtTHvt,*t)G. 

As can be verified straightforwardly the corresponding estimate O equals the coefficients 
for regressing yl onto G'zJ. Thus in the Johansen framework the Johansen estimators are 
obtained. 

(Phillips, 1995) discusses the fully-modified (FM) estimators as an alternative to least 
squares estimation. The fully modified OLS estimator (FM-OLS) of j3 is defined as 

Pols := ((y*> z t) ~ a «,a^ ~ ^«,a^a1,Az(( A2: *> z t) ~ a a 2 ,Az)) (z t , z t y x (3) 
where for processes {at)t£i and (bt)tez the estimates 

T-l T-l 
j=l-T j=0 

are used. As usual Azt := z t — zt-i- Here f a ,b{j) '■= (<Hi h-j) = T^ 1 J2t=i a tK-j denotes 
the estimated covariance sequence where observations outside of the observed sample are 
treated as zeros. Further Cl^Az is estimated using the residuals ut = Vt~ fioiszt- Throughout 
we will use the subscripts to indicate the processes involved. Additionally superscripts indicate 
components of the processes. A slight difference to the notation of e.g. Phillips (1995) is that 
for the integrated processes zt, say, we index by Az rather than only z. 

Consequently for stationary processes (at)tez and (bt)tez it follows that Cl a ^ and A a ^ are 
estimators of the long-run-covariance and the one-sided long run covariance matrices defined 
as 

oo oo 

fta,b= Ea i fc 0) A a,6 = 5Z Ea -^0- 

3=-oo j=0 

For the kernel function w(-) occurring in this definition we will use the standard assump- 
tions (cf. Phillips, 1995): 

Assumption K: The kernel function w(.) : R — > [—1,1] is a twice continuously differentiable 
even function with 



(a) w(0) = l,u/(0) = 0,w"(0) + 

(b) w(x) = 0, \x\ > 1 with lim| a ,|_ ) . 1 u;(x)/(l — \x\) 2 = constant 

Further the bandwidth parameter K in the kernel estimates is chosen proportional to crT b 
for some b G (1/4,2/3) where ct is slowly varying at infinity (i.e. ctx/ct — > l,Vx > 0). □ 

Analogously to the RRR estimator derived from the OLS estimator we derive the new 
fully modified RRR estimator (henceforth denoted as FM-RRR) from the FM-estimator using 
the SVD 

^JoLsA z t^t) l/2 = u+s+(v+y + r+, 

where as before denotes the matrix of left singular vectors, <§+ = diag(s^, sj, . . . , s+) is 
the diagonal matrix containing the dominant estimated singular values sf > > . . . > s+ > 
decreasing in size and the columns of V,^~ contain the corresponding right singular vectors. 
The estimator under the rank restriction rank(/3) = n then is defined as 

^RRR,r = ("+) 1 UnSn(Vn)'( z ti z t) ^ ' P~RRR,u = ^OLS,u~(^OLS,r~ ^RRR,r)( Z t > z t)( z t '> z t) 

(5) 

3 Results 

In this paper the following assumptions on the data generating process (dgp) will be used: 
Assumption P: The process (yt)tez i s generated according to (1) with ut = Ae t (A G R sxk of 
full row rank) where (z^ )tez and (^)tgz are processes such that for some orthogonal matrices 
H r = [H r!p H r!± },H u = [H u ^H U)i ] (fT r> || G R m ^,H u>l{ G R^xc u) we have 

diag(A(L)I Cr ,I mr - Cr )H' r 4 = v t , t€Z, di ag (A(L)I Cu ,I mu . Cu )H' u zf = w t , t G Z, 

where A(L) = 1 — L denotes the difference operator (L denoting the backward shift operator) 
and the joint vector v t := [v' t , w' t ]' is a stationary process generated according to 

oo 

where X^ii a |l^jll < 00 f° r some ° > 3/2 and where for the transfer function c(z) : = 
[cy^)', c w (z)'\' = ^2?LiCjZ^ (with z denoting a complex variable) the matrix c(l) is of full 



row rank. Additionally it is assumed that 



E 



H, 
H 



U r,± Z t 

JTl U 



> 0. 



Here (et)tez is an iid process with zero mean, nonsingular variance £ and finite fourth mo- 



ments. Finally H' Zq 



and H' u «z% 



0. □ 



Note that summation for v t starts at j = 1. Thus uncorrelatedness of the regressors 
with the noise is built into the assumptions. The assumptions imply that z\ and zf are 
1(1) processes such that the cointegrating rank of the joint process equals the sum of the 
cointegrating ranks of the two processes. 

The assumption of zero initial conditions is not important and can be replaced with 
the assumption of deterministic initial conditions, i.e. assuming that modeling is performed 
conditional on initial conditions. 

The noise is assumed to constitute an iid sequence which is somewhat restrictive. Weaker 
assumptions are possible but make the asymptotic distributions more involved. Further note 
that the same noise St is used to generate the regressors as well as the residuals in the 
estimation equation. Consequently lagged yt's are admitted as regressors and some dynamics 
may be included in the model, alleviating the iid assumption. 

Furthermore these assumptions exclude deterministic terms such as the constant as re- 
gressors which are discussed separately below. 

The assumptions on the data generating process lead to the following representation result: 

Theorem 3.1 Let Assumption P hold where n = rank(b r ), b = [b r , b u ]. 

(I) Let c y < n denote the rank of b r H r ^. Then the cointegrating rank of (-z£)tez is m r — c r 
and the cointegrating rank of (yt — b u zf) tt =z is s — c y . 

(LI) There exist nonsingular matrices T y G W xs ,T z , r & M. mrXmr and T z , u & M m " xm " such that 



yt 



zt 



yt,i 

I Vt,2 J 



= Ty{y t - b u zf) = b r z t + i t 



Icy 







t>2,3 



Zt,l 
Zt,2 
Zt,3 



+ 



£t,l 
L ^,2 J 



Zt,l 
Zt,2 
Zt,3 



T 7 U 

I z,u^t 



%1 

zu 
L Z t,2 J 



where A(L)z t ,i = c Zj i(L)e t , A(L)z tj2 = c Zt2 {L)e t , A(L)5^ 1 = c ZjU (L)e t ,t G Z (L denoting the 
backward shift operator) and the matrix [c Z) i(l)',c Z) 2(l) / ,c Z)U (l) / ] is of full column rank, and 
(ztfi)t&L and (zf 2 )tez are stationary processes with nonsingular spectrum at z = 1. 



The result is proved in Appendix A. It builds the main representation of the regression 
on which the asymptotic results are based upon. Note that the matrices T y ,T ZiU and T z>r 
separating the non-stationary and stationary directions of the various processes are not unique 
and the theorem only ascertains the existence. The restrictions on the ranks of the various 
matrices ensures that the various components are either stationary processes which are not 
over differenced or integrated processes which are not cointegrated. 

Under these assumptions it is well known that the OLS estimators are weakly consistent 
(Park and Phillips, 1988; Park and Phillips, 1989). Furthermore almost sure consistency as 
well as the convergence rate Pols — b = 0(y / log log T/T) (i.e. y/T/ log log T(/3ols — b) is 
almost surely (a.s.) bounded) can be derived, see e.g. (Bauer, 2009). Additionally their 
asymptotic distribution is also well documented: Let T z = diag(7^ r , %,u) an d let D z = 
diag(D Zir ,D ZiU ),D z , r = &mg(T~ l I Cr ,T- l l 2 I mr ^ Cr ),D z , u = &\&g(T~ l I Cu ,T~ l I 2 I mu ^ Cu ) . Then 
one obtains 

T y 0oLS - b)T z - l D- 1 A [M r Z r M u - M r N r Z u - Z r E^ 3 (^ 2 )'(E^ 2 (^ 2 )')- 1 ] 

where (using the notation 1 f(E,W) = J dEW'(f WW')' 1 ) 

M r = f(T y AW,W?), 

M u = f(T y AW,W u ), 

N r = jw z W^jw u W^j , 



vec 



vec 



t=i 

T 



t=l 



vec(Z r ), 



vec(Z u ), 



where z£ 3 := Zt,3 — (2^3, z™ 2 ) (^t,2-> ^"2) 1 ^t,2- W denotes the Brownian motion corresponding 
to (e t ) t6N and W z = c z , 1:2 (l)W,W u = c z , v {l)W,W? = W z - J 'W Z W' U {\ 'W U W' U Y X W U . Fur- 
ther vec(Z r ) and vec(Z u ) are normally distributed with mean zero (vec denotes columnwise 
vectorisation) . Finally c Z) i : 2(l) := [c Zj i(l)', c z ^{^)']' ■ 

The next theorem, which is the main contribution of this paper, extends these results to 
the RRR estimators: 



1 Here and below j dEW' is the usual shorthand notation for f Q dE(w)W (w)' for Brownian motions 
E(w),W(w),w e [0, 1]. Analogously j WW" is short for J* W{w)W(w)'dw. 



Theorem 3.2 (I) Let the assumptions of Theorem 3.1 hold. Then /3 RRR -b = 0((log Tf/y/T). 
Furthermore let $rrr, t and /3oLS,r denote the coefficients corresponding to z\. Then the 
asymptotic distribution of Prrr^ can be found from 



[Ty0RRR, r -PoLS,r)T z /]D 



-li n-1 A 
z.r 



where 



(I - d 2 dl)M r>2 [ -Y 21 Y{? I } 



R 

(i - d 2 d\)z r ^p J _ 



-[I,0]T y AEe t y' tj2 (E$ 2 y' t 



M r , 2 = f([0,r\r v AW,W^ 2 -Y 21 Y^ 1 W^ 1 ), 

Yn = I W^(W^)',i = 1,2, 

p = /-E^3(^3)T3 l2 r^,rS >2 = (r , 3i2 E^3(^ 3 )T3 ) 2)- 1 r , 3 1 2, 

b 2 ,3 = o 2 v'^,dl = (d' 2 {^ 2 ^ 

Here Z r , 2 = [0,I]Z r , Wf = [(W*)', (W*)']' and z* 3 = z t>3 - Ez t , 3 (^ 2 )'(E^ 2 (^ 2 )') 
and y^ 2 is defined analogously. Finally R is defined in Lemma A. 9. Correspondingly letting 
Prrr,u and f3oLS,u denote the coefficients corresponding to zf then 



-1 zu 
z t,2 



T y ((3RRR, u -(3oLS,u)T Zt uD Zt l = -T y (PRRR,z-PoLS,z)T Zj rD Zj l 



N r 
L E5 t ,3(^ 2 ) / (E^ 2 (5« 2 )')- 1 

(II) All results hold true in the situation that all observations are demeaned or detrended 
prior to estimation, if a.s. rates are replaced with in probability rates, the Brownian motions 
are replaced by their corresponding demeaned or detrended version and if additionally to the 
assumptions above the condition Y^jLi3 a \\^i\\ < 00 holds for some a > 3. 

Note that the decomposition of 62,3 is not specified. The asymptotic distribution does not 
depend on the actual choice. 

The theorem shows how the inclusion of the rank constraint affects the estimation error 
which is given as a sum of the error for the unrestricted estimate plus a correction term. 
All coefficients corresponding to the nonstationary directions in z t are estimated T-consistent 
and asymptotically the estimation errors have 'matrix unit root' distributions, whereas for 
directions in which (zt)teN is stationary the coefficients are only \/T consistent and the errors 
are asymptotically normal. The proof of this theorem is given in Appendix A. Note that 
the larger bounds in the almost sure convergence rates for the restricted estimator reflects 



+o P (l). 



only the techniques of proof used and not the accuracy of the estimators which is more 
appropriately represented in the distributional results. I.e. the larger bounds for the rank 
restricted estimator mirrors our inability to prove the tighter bounds rather than the relative 
accuracy of the estimators. 

In the fully modified case conditions for consistency and the asymptotic distribution of the 
unrestricted estimator is provided in (Phillips, 1995): Under the assumptions on the kernel 
provided in Assumptions K one obtains: 

Wols " b)T z ~ l D- 1 A [ M+ Z r M+ - M+N r Z u - ZrWzt^yiWzl^)')- 1 ] 
where 

b = A^-n^^J" 1 

M+ = f(B,W?), 
M+ = f(B,c z , u (l)W). 

Here the superscript n refers to the nonstationary directions in [z' t , (zf)']' and the matrices 
0^ A2 and ^a'z Az are composed of the respective columns and rows corresponding to the 
nonstationary components. 

The next theorem discusses the properties of the corresponding rank restricted estimator: 

Theorem 3.3 Let the assumptions of Theorem 3.2 hold and additionally assume that a kernel 
function fulfilling assumptions K is used in the nonparametric estimation of the long run vari- 
ances. Let Pftftft denote the FM-RRR estimator (based on the fully modified estimator $qls) 
defined in (5) for the weight H + as defined in (4). Then using the notation of Theorem 3.2 it 
holds that 

[%0RRR,r~PoLS,r)7z,r] D z,l 

where M+ = /([0, 1]T y AB, W* 2 - Y 2l Y n 1 W^ 1 ) . 

Therefore the relation between the restricted and the unrestricted regressions are identical 
for the conventional and the fully modified case. Also the expressions for the two sets of 
estimators are identical except for the use of W in the conventional case which is replaced 
by B in the fully modified case. Therefore it follows that the distribution in the direction 
of (asymptotically) stationary components of Wt is identical for both estimators. Hence in 



c z Mi)w 

c z , u {l)W 



(l-o 2 oI)m 2 + [ -i^ni 1 i 



R 

(i - d 2 dhz r oP _ 



the case that c z = and therefore no integration is present the conventional and the fully 
modified estimators have the same asymptotic distribution. This is true for the restricted and 
the unrestricted estimates. We refrain from a more complete discussion on the properties of 
the fully modified estimator since for the unrestricted case these are well documented in the 
literature. Instead a number of special cases will be discussed below. 

4 Special Cases 

First consider the case where all included variables are stationary. In that case T y = I ,T Z = I 
can be used and the asymptotic distribution of the vectorizations of VT(Pols — b) and 
VT{$ols ~ k) are both normal with mean zero and variance (Ez t ^) _1 (g> ALA' which equals 
the distribution of 

vec ([Z r , Z u - ZrEzt^YiEzl^yr 1 ]) 

noting that in this case zt = h,z = z^ 2 = z t can be chosen. The correction due to the 
rank restriction for (3 r equals the vectorization of 

-(/ - 6 2 d\)z r {i - Ez^(z t , 3 yr 32 Tl 2 ). 

The corresponding correction to b u follows. On total hence one obtains as the asymp- 
totic distribution of the RRR-estimator for b r the distribution of Z r — (J — 2 2 )Z r (I — 

Ez t 4z t , 3 yr 32 rl 2 ). 

This asymptotic distribution (for a generic case) has previously been documented in 
(Reinsel and Velu, 1998) on p. 45 (2.36) albeit in a different form which is less accessi- 
ble. On p. 46 a more explicit expression for the case n = 1 is given. It is straightforward 
to see that the expressions in this special case are identical while the formula provided above 
also provide insights in the general case. It must be noted, however, that these expressions 
are not new and have been used already e.g. in (Bauer, Deistler and Scherrer, 1999). 

The consequences of the correction using the rank restriction are the following: Premul- 
tiplying the asymptotic distribution with 2 one notices that the rank restriction does not 
influence the distribution in these directions. In the orthogonal complement, however, the 
distribution is changed from x'Z r to x' Z r Mz t ^(zt t3 )'T 32 v\ 2 and hence projected onto the rows 
corresponding to the space spanned by the columns of 1^2 . The analogous statements hold 



for the postmultiplication with Note that these arguments also hold in the general case 
for the (2,3) block of b r . 

As a second special case consider the VAR(l) 1(1) model of Anderson (2002) . For simplicity 
of notation the transformed system will be used which in the notation of Anderson (2002) is 
stated as 



AX t = TI t _! + Wi 



Or 







X t _i + W t ,t€N 



(6) 



T 22 

for Xq = where Y22 is nonsingular. This defines an 1(1) process X t G R s whose first 
component, X t ^ G M. Cy say, is integrated, the remaining, X tj 2 G W~ Cy say, being stationary 
for \\ ma x{I + ^22)\ < 1 which is assumed in the following. The variance of the iid white noise 
Wt is taken to be P]J]jj=i,2 which is partitioned according to the partitioning of X t . In our 
notation no transformation matrices are needed since the system is already in the appropriate 
coordinate system. Hence yt = yt,2 = AAj is stationary, zt t 2 = Xf—i i, zt$ = Xt-i^2,Zt does 
not occur. Consequently b r = T = [T :) i, Y :) 2] and b u does not occur. 

In this situation (Anderson, 2002) gives the asymptotic distribution of the unrestricted and 
the restricted estimates of T. Consider first T :> i, i.e. the first block column. Then Theorem 
1 of (Anderson, 2002) states that TT-^^ols J:,il\\ which in our notation equals f(W, W\) 
where W is the Brownian motion according to ut = Wt and W\ denotes the corresponding 
first block. For Prrr^ (Anderson, 2002) states the asymptotic distribution of the first block 
column as 



TT. 





2.1% J 



J2.1/" 1 



, J2.1 — [-^Wwi^Ww) 1 iI]J:,l- 



From Theorem 3.2 we obtain 

TP:,1,RRR = TP :tl , O LS+T0..,l,RRR-p..,l,OLs) 4 f(W, Wl)-(/-0 2 Oj)/(W, W ± ) = 6 2 6\j..^. 

The second block column of b r provides the decomposition O2 ■= [0,I]',T f 32 ■= ^22- Here 
6 2 dl = [0,/]'([0,7]E^ 1/2 [0,7]')- 1 [0,-nS^ 1 , 2 where 

-1 



2/2,y2 







yl2 




V21 


Q 






1 











+ 



I 



(for some matrix Q) according to the block matrix inversion. Thus O2O2 = [0, /]'[— T^ W {Y^ W ) 
showing the identity of the expressions. 



For the FM estimator note that the involved long run covariances equal 



'u,Az 



::n 




Az,Az 



= s 



11 

WW 



with all other terms being zero. Correspondingly 



Ka^aIaX 1 = [I, (^wwr^ww]' => -h.iln 1 = f(B, W). 



Thus it follows that the coefficients to the nonstationary regressors for the FM-estimator 
have the same asymptotic distribution as the RRR estimators. Adding the rank restriction in 
this case does not change the asymptotic distribution while it might well influence the finite 
sample properties. It is straightforward to show that in this case also the RRR-FM estimator 
has the same distribution for the columns corresponding to the integrated regressors. 

With respect to the stationary directions it is easy to see that P = since = T22 is 
invertible. Consequently the RRR estimator and the OLS estimator have the same asymp- 
totic distribution in the columns corresponding to the stationary regressors. Since FM and 
OLS estimators have the same asymptotic behavior for stationary regressors all four estima- 
tors show the same asymptotic behavior in these columns. The underlying reason for this 
is that the rank restriction exclusively applies to the nonstationary restrictions where the 
corresponding coefficient is restricted to zero. For the stationary regressors there are no other 
rank restrictions in this case. 

Adding additional lagged first differences to (6) the AR(p) setting with transformed co- 
ordinates is obtained. The additional coefficients are not restricted (except for the seldom 
imposed restriction of stability of the corresponding transfer function) and hence in this case 
zf = zf 2 = [A-^t-ii • • • , AJ{_ p+1 ]', i.e. additional stationary regressors are present. It is well 
known that in this case (using the usual notation such that T = a/3' where a'^a = 0, f3' ± {3 = 
for orthogonal matrices a± , /3± of maximal dimension such that the columns span the orthog- 
onal complement of a, (3 respectively) we have 



for some stationary process wt and nonsingular matrix Tj (expressions could be given but are 
not of importance in the following and hence omitted). In the example a' = [0, T 22 ], P' = [0, 1] 
and thus a' ± = [1,0], P± = [1,0]'. 





The changes in the asymptotic distribution are the following: W is unchanged while W z = 
T~i l W\. The stationary components change accordingly. Imposing the rank restriction (as is 
done in the Johansen quasi- ML estimators) does not change the asymptotic distribution of the 
coefficients corresponding to the stationary terms as in the AR(1) case presented above since 

again is nonsingular and hence P = 0. Thus we obtain the same asymptotic distribution 
as in the non restricted case. This asymptotic distribution is also given in Theorem 13.5. of 
(Johansen, 1995). 

For the coefficients corresponding to nonstationary coordinates we obtain analogously to 
above 

TP:,1,RRR = Tp :tl> OLS+T0..,l,RRR-P:,l,OLs) 4 2 d\f{W, W z ) = 

For the FM estimator note that the involved long run covariances equal 

:,n _ v :,l /-p-ly r\n,n _ j-,-l v ll fp-ly 
s l u,Az ~ ^WW\ l J ) > iL Az,Az ~ J ^WW\ l J ) 

due to the change in the nonstationary directions. Then ^"^(^"aJ" 1 = I/' (^ww^^wwY (^j 1 )' 
as above implies that again the unrestricted FM estimator has the same asymptotic distribu- 
tion as the RRR estimator. This is remarkable since the FM estimator does not require the 
specification of the rank restriction. This has already been observed in (Phillips, 1995) but 
apparently did not draw the attention of the community. 

5 Conclusions 

In this paper the asymptotic properties for two estimators in a regression setting explicitly 
imposing a rank restriction are discussed. Beside providing (almost sure) rates of convergence 
also explicit expressions for the asymptotic distribution of transformed estimators (such that 
stationary and nonstationary coordinates are separated) are provided. These expressions 
reveal the main characteristics of the estimators and allow insights into the relative merits of 
the various methods such as the gain in asymptotic accuracy obtained by imposing the rank 
restriction. In particular it is shown that the fully modified estimators in many situations 
achieve the same asymptotic distribution as the rank restricted regression OLS estimators 
without imposing the rank restriction. This is an attractive feature in situations where the 
rank is not known. 




I 



[-Y$ w {Y$ w r\l]f{W,W z 



The results contain a number of well known situations as special cases and even in some 
of these cases allow new insights as the previously published expressions for the asymptotic 
distribution are much more complicated to interpret. 

Finally it must be noted that the results in this paper are seen to be intermediate results 
that might in many cases not seem to be relevant as they relate to transformed estimators 
where the transformations are not known during the estimation. Nevertheless, the results 
are important ingredients to explore the properties of procedures that use the RRR as an 
intermediate step. An important example are subspace methods in the case of cointegrated 
processes. These results will be presented elsewhere. 
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A Proofs 

Throughout the appendix the following notation will be heavily used: For a sequence of ran- 
dom matrices Ft with elements i^j,T and a sequence of scalars qt we will use the notation 
Ft = o(5t) if lim sup^^ maxjj \Fij : T/gr\ — > almost surely (a.s.). Similarly Ft = O(gT) 
if there exists a constant M such that \im.swp T ^-oo max i,j \Fi,j,T/gr\ < M a.s. The cor- 
responding in probability versions are: Ft = op(gT) if maxjj I-F^t/atI — >• in proba- 
bility and Ft = Op(gT) if for each e > there exists a constant M(e) < oo such that 
liniT-Ko P{maxjj | Fij^/gr | > M(e)} < e. In all these statements T denotes the sample 
size. Therefore in particular convergence in distribution to a finite dimensional almost surely 
finite random variable implies the rate Op (I). Throughout convergence in probability will 
be denoted as A and convergence in distribution as -4. Almost sure (a.s.) convergence 



is denoted as — K |.| denotes the Euclidean norm if not stated explicitly otherwise. ||.||ft- 
is used to denote the Frobenius norm. As usual the integral §W\W' 2 is short notation for 
/ Vi(w)W 2 H'(iw and / dW x W 2 is short for ft dWi(u)W 2 (u)' . Here W^u) and W 2 (uj) are 
two Brownian motions on [0, 1]. 

A.l Preliminary lemmas 

Lemma A.l (I) Let (st)tez. denote a white noise sequence which fulfills the noise assump- 
tions contained in Assumption P. Define xt,i '■= Yl'j=i £ j^ — 2, £1 = 0, vt '■= c v {L)et := 
Yl'iLi C v ,i£t-i,t £ N, for some transfer function c v {z) := Y^jLi Cvjz 3 where Y^jLi \\Cv,j\\j a < 
oo for some a > 3/2. Furthermore n t := ^5=0 * e ^> f or a sequence C n j such that 

J2JLo \\C n ,j\\j a < oo, a > 3/2 and for c n (z) = J^'jLo Cn,jzi it holds that det c„(l) / 0. Finally 
let Q T := ydog log T/T. Then 

\\{v t ,e t )\\=0(Q T ) , \\(v t ,v t )-Ev t v' t \\=0(Q T ), 
(x t ,i,x ttl ) = O(TloglogT) , {x ttl ,e t ) = O(logT), 
11(^,1,^)11 =0(logT) , (xt^x^)- 1 =0(Q 2 T ). 

All expressions remain true if x tj i is replaced by nt- 

(II) Furthermore using A v ^ n = Yl < jLo^ ,v j(^ n oY where An t = c n (L)et,t £ Z we have: 

(v t ,n t ) A J c v (l)dWW'c n (iy + A vAn , 

T-^n^nt) A c n (l) J WW'cnil)', 

vec{T 1 ' 2 {e u vt)) 4 jV(0,Ev t v' t ®Ee t e' t ) 

where vec denotes column wise vectorization, W(w) denotes the limiting Brownian motion 
corresponding to T~ 1 / 2 X^j=P £ j- Finally M(0,V) denotes a Gaussian random variable with 
mean zero and variance V . 

(III) Let xt := [x' t> i, x' tl ]' where (xt t ,)teN fulfills the same restrictions as (vt)teN under (I) 
and (xt,i)teN and (nt)t<=N are integrated and of the same form as (n t )teN under (I). Further 
let (vt)t<=n and (wt)t<=m be two stationary processes fulfilling the assumption of (vt)teN under 
(I). Let (st)tez be as under (I). Let 77 denote the residuals of a regression onto xt and let 
n denote the corresponding limits (whenever the symbol is used the limit exists). Hence e.g. 
vf = v t - (vt,xt){xt,xt)~ 1 x t . Then 

<e t ,<) = (e t ,v?) + (T-V2) = 0(Q T ), 



(£ t ,<) = (e t ,n t ) - (e t ,x tyl )(x t ,i,x tA ) 1 (x t ,i,n t ) + o(l) = 0(logT(loglogT) 2 ), 

«,<) = <^ n ,^ n ) + 0(Q T ) = 0(l), 

= 0(logT(loglogT) 2 ), 

«,<) = (n t ,n t ) - <n t ,x t ,i)<x t> i,x ti i)- 1 <x t) i,n t ) + o(T) = 0(T (log log T)). 

f/Vj £ei c?t := and := diag(l,T~ l ). For any process (at)teN kt at denote the de- 
trended process a t := a t - (a t , d t )(d t , d t ) -1 ^ . Lei v t = Y^JLoCvjet-j where Y^jLo3 2 \\ C v,j\\ < 
oo. T/ten ||(v t ,v t ) — Ew t ^|| = Op(Qt)- Further for (xt)teN as i n (V it follows that (et,xt) = 
Op(l), (v t , x t ) = Op(l). The same holds for replacing x t with n t := Y^j^Q^n,j x t-j if 
£~oll^||i 3 <oo. 

The following limit theorems hold: 

(e t ,x t ) AjdWW' , T- X (xt,xt) -4 J WW' 

where W := W — (j W(s)ds)(A — 6oj) — (J sW(s)ds)(12u — 6) denotes the demeaned and 
detrended Brownian motion associated with (et)teN- 

The analogous results also holds for the demeaned series at := at — (at, 1) where W := 
W — (j W(s)ds) appears in the asymptotic distributions. 

Proof: (I) (v t ,e t ) = 0(Q T ) and (v t ,v t ) - Ev t v' t = 0(Q T ) follow from Theorem 7.4.3. 
of Hannan and Deistler (1988). (xt,xt) = 0(T log log T) follows from Theorem 3 of Lai and 
Wei (1983), (x t ,e t ) = o(logT) from Corollary 2 of Lai and Wei (1982). Both results only 
deal with the univariate case but the extension to the multivariate situation is obvious. This 
result also implies (xt,vt) = O(logT) (see (Bauer, 2009), Lemma 4). The same result applies 
for n t in place of x t by splitting n t = c n (l)x t + n* t (Beveridge-Nelson decomposition, see e.g. 
Phillips and Solo, 1992) where 

t-1 oo t-1 

n * =n t - C n (l)x t = ^ C n,i x t-j ~ ^2 C n,j x t = ( C n,0 ~ c n{^))x t + ^ Cn,jX t -j 

3=0 3=0 j=l 

t-1 

= (C n ,0 - C n (l))£ t + (C n> o - C n (l))x t -1 + ^ C n jXt-j 

3=1 

t-1 

= (C n , - c n (l))e t + (C n , + C n: i - c n (l))(e t -i + x t - 2 ) + ^ C n jx t -j 

3=2 

t-1 

= ^2 C n,j £ t-j 
3=0 



where C* „■ := —Yl^i+i^nj- Due to the summability assumptions on C n j the transfer 



n,i 



function c* (z) := X^So i z% fulfills the properties of Theorem 7.4.3. of Hannan and Deistler 
(1988). The result then follows from the assumed non-singularity of c n (l). 

The univariate version of = 0(Q\) is contained in Lai and Wei (1982, p. 163). 

The multivariate version is showed in Bauer (2009). 

(II) Since ^1=1^ e t/ VT =4> W(w) (Davidson, 1994, Theorem 27.17) the convergence of 
(vt,nt) is e.g. given in Park and Phillips (1988, Lemma 2.1. (e)). The result for T^ 1 (nt, nt) is 
stated in part (c) of the same lemma. The central limit theorem is standard (cf. e.g. Hannan 
and Deistler, 1988, Lemma 4.3.4.) since (et)tefi is an ergodic square integrable martingale 
difference sequence. 

(III) The proof is based on the block matrix inversion formula 



" A 


B ' 


-l 


A' 1 " 




" -A- l B ' 


C 


D 







+ 


I 



(D - CA^B) 1 [ -CA- 1 ,! 



(8) 



applied to (x t ,x t ). As an example consider 

= (£t,vt) ~ {e t ,xt){x t ,x t )' 1 {x u v t ) 
= (£t,v t ) - (£ t ,x ti .)(x t) .,x ti .) _1 (x ti .,v t ) - (£t,x t ^i)(x t ^i,x t ^i) _1 (x^i,u t ) 

wherex^i := x t ,i-(x t ,i, x t ,.)(x t ,; x ti .) _1 x ti .. Therefore (x tj -,i, x t ^i) = (x tjl ,x tjl )-0(\ogT)0(l)0(\ogT) 

and (x t ,^,vt) = (xt tl ,vt) - O(logT) = O(logT) by (I). Thus 

(e t ,vf) = (e t ,v t ) - {e t ,x t ,.){xt,.,x t ,.)-\x t ^vt) + o((logT) 3 /T) 
(e t ,v t ) - {e t ,xt,.}(Ext,.x' t: A- l Ex t ,.v' t + op- 1 / 2 ) 

since (e t ,x tt ,) = 0(Q T ) and (Ea^.a^.) -1 !^,.^ - (x t ,„ a; ti .) _1 (a; t) ., v t ) = 0(Q T ) as required. 

For (vf, wf) = (vt, wf) the same arguments apply with the exception that now (v t ,wt) = O(l) 

rather than 0(Qt) leading to the second claim. The other claims follow in a similar manner 

from the bounds achieved under (I). 

(IV) Only the results for detrending are shown, the analogous statements for the demeaned 

series are obvious from the given results. The derivations here use Lemma 1, p. 121 of Sims, 

Stock and Watson (1990). Lemma 1 (g) shows that (vt,d t )Dd = Op{T~ 1 / 2 ) for stationary 

(v t )teN and (x t ,d t )D d = O p (T 1/2 ) for integrated (x t )ten- 

For (e t , x t ) = (et,x t ) — {et-,dt){dt-,dt)~ 1 {dt,xt) note that the first term converges in distribution 
according to (II). For the second term note that VT(e t , dt)D d -4- [J dW, J udW], Dd(d t , dt)Dd 
converges to a constant nonsingular matrix and T~ 1 / 2 D c i(dt,Xt) [f W, f u>W]'. The last 
two statements follow from Lemma 1 (a) and (c) of Sims et al. (1990). Therefore 

d 



J dWW' - J dW, J 



L0dW 





\ f 1 




-1 






.J" 


J- 2 . 




fuW 



This shows that the Brownian motion in the limiting expression is demeaned and detrended. 

If e t is replaced by v t convergence in distribution still holds, but the limits change. 

The evaluations for T~ 1 (xt,xt) follow the same lines and are omitted. 

Decomposing n t = c n (l)x t + n\ as above shows that in the above calculations x t can be 
replaced with nt without changing the orders of convergence. 

Finally if the time trend is omitted and only demeaning is performed the results can be 
shown analogously using the arguments given above. □ 



Lemma A. 2 Under the assumptions of Theorem 3.3 the following holds true: 
(Azt,zt) - A AzAz 



JdB[ 



W]' + o P (l) 



Aaz,Am) 



= [^A^r^r+Mi) o P (i)], 

C*,l:2(l) 
C z ,u(l) 

T- x z T ,2z' T ± + Op(K~ 2 ) + Op(l/VTK) 
A^ z + P {(K/Tf/ 2 ) P (l/VKT) 

o p {k- 2 ) + o p (i/Vkt) 

Op(K- 2 ) 



Op(K- 2 ) + O p (1/VKT) 
P (K- 2 ) 



where B = 

Here the notation refers to the transformed vectors z t = [(z tj i)', (ztp)', (-^,3)', (zf 2 )'}' 

where the nonstationary components of both vectors (first block row) and the stationary com- 
ponents (second block row) of these vectors are separated. K denotes the kernel bandwidth 
parameter (see Assumptions K). Furthermore A w> a v = Kwtv' t for stationary processes (vt)tez 
and (w t )tez- 

The proof of all but the last of these facts can be found in Phillips (1995, Lemma 8.1). 
The last fact can be easily derived from the infinite sum representation of A Z: a v . 



Lemma A. 3 Let b r = OG' where G'S P = I n . 



Here S p £ M" 1 ^" denotes a selector matrix 



(i.e. a matrix composed of n columns of I mr ). Let (3 r denote an estimator of b r such that 
\\Pr ~ W\\ft = o(a T ). 

Assume that — H + ||ir r = o(T~ e ) and \\Ep — H~||ir r = o(T~ € ) (E + and H~ being non- 
singular) for some e > and let H + OGHp be obtained as the best (in Frobenius norm) 
rank n approximation of E + (3 r Ep . Further let := (0'(E+) 2 0)~ 1 0'(3+) 2 assuming that 
||(0'(H + ) 2 Om<oo. 



Then for T large enough G can a.s. be chosen such that G'S P = I n . Further 

G'-G' = 0\p r - b r )(I p - S P G') + o(a T ). (9) 

Proof: Since \\j3 r — 6 r ||_p r = o(ax) it follows from the boundedness assumption on that 
0^(3 r S p — > (3 r S p = I n . Since OG' is a best approximation to b r based on (3 r in a weighted 
least squares sense it follows that 

|| OG' - b r \\ Fr < \\OG' - P r \\ Fr + \\~pr - b r \\ F r = 0(||/3 r - b r \\ Fr ) = o(l). 

It follows that O^OG'Sp — > 0^b r S p = I n . Therefore G subject to the restriction G'S P = I n 
is well defined a.s. for T large enough. It follows that ||(OG' — b r )S p \\ Fr = \\0 — 0\\ Fr = o(l). 
Letting 6^ := (& (E + ) 2 O)' 1 & (E + ) 2 one obtains 

G' — G' = O f /3 r - ] b r = (O t - O f )~p r + 0\p r - b r ). 

Then ||0 — 0|| = o(l) and ||H+ — £+|| = o(T _e ) together with the bounds on the norms 
||(0 / (H + ) 2 0)- 1 ||, ||S + || and ||0|| Fr imply ||Ot _ d^\\ Fr = o(l). Using the fact that (G' - 
G')S P = and b r (I - S P G') = one obtains 

G' - G' = 0\~p r - b r )(I - SpG') + (O f - O*)0 r - b r )(I - S P G') 

which proves the lemma. □ 

Lemma A. 4 Let At = A' T = Aq + 6 A , Aq = A' and Bt = Bq + 5 b be two sequences of 
matrices A F G M. axa and Bt £ M axf> . Aq and Bq are possibly random matrices. Assume that 
all matrices are partitioned as 

A ,n + O P (a T ) P (a T ) 
P (a T ) A 0t 22 + Op(a T ) 

B ,u + P (b T ) P {b T ) 
P (b T ) B 0i2 2 + Op(b T ) 

such that At,h € R cxc , .Bt,ii £ M cxc and all other matrices have the corresponding di- 
mensions. The subscripts for all matrices indicate the corresponding blocks. Assume that 
A o,u = Op(lMoj 2 = O p {1). Finally let J T := B'^A^Bt - B^Aq^Bq. 



A t = 
Bt = 



At,u At, 12 

At,21 At,22 

Bt,u Bt,\2 

Bt,21 Bt,22 



A QA i + <$£ 

°21 



°12 
A),22 + S A 



22 



-Bo, 11 + 5n 
5g 



°12 



Bo,22 + ^22 



Then if ap and bp are such that ap — > 0, bp — > we have 



Jt,U - (^f^'^ojl-^O,!! +- B 0,ll j4 0,ll^fl ~ ^O.llAljAlA),!!^,!! 



+ 



($2l)' ~ ^0,11^0,11^12 A),22 $21 ~ ^l^O.ll-^O.ll + Op(ap + bp) 



Jt,21 — (^/A),!!^,!! + [^0,22 + $22] A),22 



821 — #n A n \i B[ 



21^ 1 0,11- D 0,11 



ryl 4-I 4-1 
- D 0,22 /1 0,22°22 /i 0,22 



^21^-0,11-^0,11 



<^T,22 — 



~~ $21 A^ll^)' (A),22 ~~ ^0,22(^22 ~~ ^21 A), 11 ^12 ~~ ^22^0,22^22)^0,22) A),22 
+-^0,22 (A),22 ~~ ^0,22(^22 ~~ $21 A),11^L2 ~~ ^2 A),22^22) A)^) ($22 ~ ^21^0,11^) 
~B'o t 22 (^0,22(^22 ~~ $21 A),11^12 ~~ ^22^0,22^22)^0,22) ^0,22 

+ ($22 ~ ^A^ll^/A^^i ~ ^21^0,11^) + (^2)' ^0,11^2 + Op(a T + bp). 

Therefore Jt,u = Op(a 2 r , + b 2 ^) and Jt,12 = Op(ap + bp), Jt,22 = Op{ap + bp). All evaluations 
hold if all in probability statements are exchanged by almost sure convergence. 

Proof: The proof follows from straightforward algebraic manipulations using the block 
matrix inversion 



A: 



o 



A' 1 

^T,ll 





+ 



Arp -, -, A 



T,U^T,12 
I 



^t,22 - ^4t,21 ^4^1 1^,12) -At^iAj,^,! 



(10) 



noting that 



a t\i = ( A o,u + $fi) 1 = \li - A),ii4A),n + °p(4) 
since ap — > and A^ \ x = Op(l) by assumption. Similarly 

(^A Ty2 2 - ^,21^,11^,12) = A l 2 - Aq1 2 (S22 - £21^11^12 ~ ^22^22^22) ^22 + p( a r) 
follows. The remaining calculations are tedious but straightforward and hence omitted. □ 



Lemma A. 5 Define the two generalized eigenvalue problems: 

(a) QG = MGR 2 , (b) <lf = 'If 6 2 . 

where G G R mrXn ,T G R mrXn ,R 2 G M nxn ,6 G R nxn . Further ^ and 9 are assumed to be 
nonsingular a.s. and O is diagonal. 

(I) If J ■= Q-$ = 0{a T ) andb zz := M — * = 0(b T ) (where a T -> 0,b T -> for T -> 00) 
then there exists matrices G and R solving the eigenvalue problem (a) and matrices T and 6 



solving (b) such that V S p = I n (where S p denotes a selector matrix, i.e. a matrix consisting 
of columns of the identity matrix), G — T = 0{ot + br),R — O = 0(ot + &t)- 

(II) Further let SG := G - f . Then the following two equations hold (V* := (f '*f ) _1 f '): 

$5G-y5GR 2 = 5 ZZ GR 2 + *f (R 2 - 6 2 ) - JG, (11) 
(I m - tff f f )*<5Gi? 2 = (I m - *fr^) [JG - 5 ZZ GR 2 + <&5G] (12) 

(III) By transforming G = G{S' p G)^ 1 it follows that G solves the generalized eigenvalue 
problem (a) with matrix R 2 = (S' p G) R 2 (fi^G)" 1 . Then G - f = 0(a T + b T ),R- G = 
0(aT + 6t). .Here -R is noi necessarily block diagonal. 

Proof: Solutions to the generalized eigenvalue problem are not identified. If all eigen- 
values are distinct then fixing the sign of one nonzero entry in each column of T results in a 
unique solution (see e.g. Bauer et al., 1999, p. 1246, for a discussion). If there are repeated 
eigenvalues then more restrictions need to be introduced in order to achieve identification. It 
follows from operator theory (cf. e.g. Chatelin, 1983) that there exist normalizations such that 
the solution to the eigenvalue problem depends analytically on the matrix which is decom- 
posed, i.e. such that G — T = o(l) a.s. In these normalizations R 2 is not necessarily diagonal 
while still being block diagonal where the blocks correspond to the identical eigenvalues in 0. 

In particular let the sequence of matrices Mr — > Mo- Let (pi denote the matrix whose 
columns span the eigenspaces of Mt corresponding to the eigenvalues Xj — > \o t i,j = 1, . . . , rrii 
where rrii denotes the multiplicity of the eigenvalue Ao,i of .Mo with corresponding eigenspace 
spanned by the columns of the matrix </?o,i- Here it is assumed that the normalization y^y?o,i = 
I mi = ip' jV3o,j is chosen. Then it holds that 

& - <Po,i = (Ao,i/ - M )HMt - Mo)w,i + 0(\\M T - M \\ 2 ). (13) 

Here denotes the Moore-Penrose pseudo-inverse. In particular let Mt = M~ 1 Q and 
A^o = and the columns of G and T equal ipi and ipo^ respectively. The condition of 

nonsingularity for 6 ensures separation from the kernel of .Mo and hence correct specification 
of the size of T. Then let f := r(5^r) _1 ,G := G(S^r) -1 . Clearly f is a solution to the 
problem (b) fulfilling the assumption of the Lemma. 

The assumptions imply that M.t — -Mo = 0(ar + &r) showing G — T = 0(ax + &r) and 
consequently G — T = 0(clt + 6r)- The order of convergence for R — B then follows from the 
fact that all other terms in (a) and (b) differ only by this order. 



(II) Equation (11) follows from simple algebraic manipulations using the definitional equa- 
tions (a) and (b) and Q = <l + J, M = * + 5 ZZ , G = f + SG. Premultiplying (11) with ft a 
rearranging of terms leads to 

R 2 - e 2 = f f [$<5G - ^5GR 2 - 5 ZZ GR 2 + JG}. (14) 

Inserting this into (11) shows that 

(I m - *f ft) [*(5G - ^(5Gi? 2 - 5 ZZ GR 2 + JG] = 0. 

This shows equation (12). 

(Ill) Follows immediately from (I). □ 



A.2 Proof of Theorem 3.1 



(I) Note that diag(A(L)J Cr , I mr - Cr )H' r zl = c v (L)et- Consequently the dimension of the coin- 
tegrating space of (40teN is equal to m r — c r . The claim on the dimension of the cointegrating 
space for (y t — b u zf)t£i also follows immediately from this representation. 

(II) Note that yt denotes a transformation of yt — b u zf = b r z\ + Ae^ = b r H r H' r zl + Ket 
which equals the estimation equation with the effects of removed. Here we use that H r 
was defined to be orthogonal. Next it is proved that matrices T y and 7^ r transforming the 
equation into the required form exist. 

Let T y 6 W xs and nonsingular C G W r>CCr be chosen such that 







pSXC r 



It is easy to see that such choices always exist since c y denotes the rank of b r H r \\. Then 
in y t := T y (yt — b u zf) = T y (b r H r H' r zl + ke t ) the first c y coordinates are integrated, the 
remaining being stationary. Choosing 7^ r = diag(G _1 , I mr - Cr )H' r we obtain that the first c r 
components of 7^ r z[ are integrated, the remaining ones being stationary. Using the above 
equation we obtain 



Tb T~ l 
'y°r I z ,r 



b. 



r,13 



6 r , 23 j 

Then the choice 

" / 6 r ,i 3 

T-.r o/o r_ 

I 

leads to the required representation. The remaining claims are straightforward to derive. 
Details are omitted. 



A.3 Proof of Theorem 3.2 
A. 3.1 Consistency 

Note that all estimators can be obtained in a two step procedure by first concentrating 
out zf and afterwards maximizing the quasi likelihood with respect to f3 r (Frisch-Waugh- 
Lovell equations). For fixed estimate $ r the least squares estimator for b u is given by $ u = 
(y t - p r z r t , zf){zf, zf)-K Therefore 

K-hu = (brzl+buzf+Aet-Przl-buZ^z^iz^z?)- 1 = (As t , %) <4\ ^)- 1 +(6 r -/3 r ) {zl , %) (z?, 

(15) 

This formula applies for the restricted and the unrestricted estimator. For the first term 
note that 

( £ ti z t)( z t 'i z t) = ( £ t,7~z,uZt)(T z ,uZt ,Tz,u z t) 1 T z ,u = ( £ t, zf)(zf , zf) 1 T z ,u- 

Now (Bauer, 2009) implies that 

{et^H&z?)- 1 = [0(Pt),0(Qt)] 

where Q T = y / \og\ogT/T and P T = y^ogTloglog T/T 2 . 

In order to simplify the notation we use the symbols yf := T y {yt — {yt, z'l) ( z t > z 't)~ l zf) 
and zl := T z , r {zl — (zl,zf}(zf,zf}^ 1 zf) throughout the proof. Here the superscript r corre- 
sponding to zl will be omitted for notational simplicity. The corresponding symbols and 
z^i denote the corresponding limit (a.s.) for T — > oo (where the symbols are only used if the 
limit exists). In general the residuals of the regression of any variable onto z^,t = 1, . . . ,T 
will be denoted using the superscript n and n will denote the corresponding limit (where it 
exists) . 

Using the same result and the Frisch-Waugh-Lovell equations for the first term and the 
orders of convergence stated in Lemma A.l 

(it^DiziJi)- 1 = [o(p t ),o(Qt)\, 

(z^zDiz^zl)- 1 = (5 t ,5r)diag(T- 1 /,/)((z i «,zr>diag(r- 1 /,/))- 1 

" 0((loglogT) 2 ) 0(logT(loglogT) 2 ) " 
0(1) 0(1) 

Therefore it is sufficient to show that (b r — (3 r ) converges to zero. To this end a transformed 
problem such that all transformed matrices converge to nonrandom matrices is analyzed first. 



In this setting it will be possible to provide a.s. bounds for convergence rates. Afterwards 
the solution to the transformed problem is related to the solutions of the original problem. 

Thus consider the transformed problem using the transformation matrices D y = diag((z^i, z^i) 
and D r = diag((5t i i, i^i)" 1 / 2 , 1) respectively to transform the input and output of the esti- 
mated regression according to yt = D y yt,Zf = D r Zf The transformed estimator $ r : = 
DyPoLS,rDj l converges to b r = T y b r T~ l = OG' where the last equation defines O and 
G. Adapting the weighting E + := E + D~ 1 we obtain E + = diag(/ Cj/ , (Ey^y^)" 1 / 2 ) and 
E + — E + = O ((log T) log log T/^/T) as needed in Lemma A. 3. This is obtained using the 
Cholesky factor as the square root of a matrix which is a differentiable operation. With this 
new normalization we obtain 2 

S$r ■ = P0LS,r ~ W = D y O LS,r ~ K)^ 1 = Dy[0(P T ) , O(Qt)]^ 1 

" 0(logT(log log Tf/T) 0((log log Tf' 2 /T) ' 
0((logTf/yVT) 0(Q T ) J' 

pRRR,r-b r = OG' - OG' = (O - 0)G' + 0(G' - G'), 
O - O = (Prrr^t - b r )S p , 

G'-G' = 0\j3oLS,r-b r )(I-S p G') + (& -O^OLS,r-b r )(I-S p G') 



where 



Sm 



I 



/ 





,G 
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o r 3 , 2 



, / - s p g' = i - 



I 
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1 32 












Here the first two block rows of S p correspond to zt,i and 2^2- The remaining two blocks 
correspond to 2^3. Since $rrr )T is a best rank n approximation to $ r (see the proof of 
Lemma A. 3) the rate of convergence of $ r implies that $RRR, r - K = 0((logT) 3 / 2 / '\fT). 
Consequently also O-O = 0((logT) 3 / 2 /y/T) and hence also Ot _ C»t = 0((log T) 3 / 2 /Vf) 
where & := (0'E\0)~ 1 0'E 2 l _. This shows the convergence rates for the solutions to the 
transformed problem. It thus remains to connect the solution to the original problem to the 
solution of the transformed problem. 

It is straightforward to see that the untransformed estimate G' = (& byfioLS^Sp) -1 ^ D y j3oLS 
such that G'S P = I n . According to the limits above 

tfDypoLS^D- 1 = mr + O^DyPoLS^D; 1 -b r ) 



2 Here and below we will not always use the tightest possible bounds but use powers of log(T) instead for 
readability. Improvements are possible but their practical merits must be doubted. 



= &OG' + O^Spr 

Now let D n = diag((z t ,i, z t ,i}~ 1/2 , In-c y ) such that D Z S P = S p D n . Then 

[&-G'] = D^iO^O + 0^5$ r S p )- 1 {O^OG' + 6^5$ r )D z -G f 

= Dn l {&0 + O t <5/3 r S , p)- 1 ((O t O + d^5$ r S p )G' - 6 ] 5$ r S p G' + d^S$ r )D z - G' 

= b-\o^o + dUprSpyWsfirii - s p G')b z 
= b- l (o^o + dUfirSpyWsfirDtii - s p g') 

= b- l 0^5$ r D z {I - S P G') + D' 1 ((log T) 3 /T)D Z 
= o^b~ l 8i3 r b z (i - S P G') + b- l O((\ogTf/T)b z 
= [0((log T) 4 /T, (log Tf/Vf )) 

where we have used that all terms have been shown above to be of order 0((log T) 3//2 /\/T). 
Further (due to the usage of the SVD and the corresponding orthogonality relations) 

0-0 = OG'(% \%)& - O = PolsM \%)& ~ O 

= (y t , ~zl)& -0 = (b r z t + e t , %)& - O 

= OG'(~z?,~z?)& -0 + (e t ,z?)& 

= OG'(zJ, zl)& -0 + 0(G' - &){%, %)& + (i t , %)& 

= 0(G' + 

where 

& := G{&(%,%)G)- X = Gb n (b n G'(zi^t)Gb n y x b n . 

It then follows from the orders of convergence provided in Lemma A.l that D z (z£, zJ)&D~ l 
O(logT). Consequently 

[&-(?](%,%)& = [0((logT) 4 /T),0((logT) 4 /VT]b^b z (~z?,~z?)& 
= [0((logTf/T),0((logTf/VT)}. 



Furthermore 



= [0(P T ),0(Q T )}b z 1 0(\ogT)b n 
= [0((logT) 3 /T),0((logT) 3 )/v / T)]. 



Together these orders imply 6-0 = [0((log T) 6 /T), 0((log Tf/VT)] and thus 
P R RR,r-b r = (6 - 0)G' + 0(G' - G') + (O — 0){& - G') = [0((logTf/T),0((logTf/Vf)]. 
Consequently we obtain from transforming (15) 

pRRR,u -6„ = 0((l0g Tf/Vf). 

This shows the convergence rates. 
A. 3. 2 Asymptotic Normality 

In order to derive the asymptotic distribution of the estimator (3rrr the proof extends the 
theory contained in Anderson (2002). Since the proof is rather lengthy, the main steps are 
documented using lemmas summing up the main intermediate results. 

Note that the RRR estimator is obtained from the singular value decomposition (using 
the symmetric matrix square roots) 

(v!,%)- 1/2 (v?,%)<%,%)- 1/2 = urv'. 

Then as in Anderson (2002) (1.10), p. 205, the reduced rank estimator can be obtained 

as 

TyP R RR,rT~ l = (y^%)G(G'(z^~zt)GT l G' 
where G = (zf , zj )~ l / 2 V n Tc £ M. mxn satisfies the following equations 

where R 2 = diag (r^,r|, ...,?„) denotes the matrix containing the squares of the n largest 
estimated singular values as its diagonal entries. The function of the transformation matrix 
To will become clear from the following. 

Introduce the following notation (where in D z the subscript r is omitted for notational 
simplicity) : 

D z := D z T l l 2 = diag(T~ 1 / 2 /, /), D y = D y T l l 2 = diag^ 1 / 2 /, /), 
G := D^G, 

Q := {b z z^b y r t ){D y r t ,D y r t )- l {D y r t ,D z z*), 
M := (D z z?,D z ~z?), 



* : = 





(^3,^3) 








(^3' ^2) {yt,2iVt,2) (Vt,2^t,3) 





" / 







" / 





r = 








->r = 













^3,2 







^3,2 



A summary of the (unfortunately heavy) notation used can be found in Appendix B. The 
main guideline of the notation is to use Latin letters for matrices in which the stationary and 
the nonstationary subproblems are not separated (i.e. the off-diagonal blocks potentially are 
nonzero) and Greek letters for matrices for the decoupled problems. A bar indicates estimates 
(appropriately normalized so that convergence holds). This leads to two generalized eigenvalue 
problems related to SVDs: 

(a) QG = MGR 2 , (b) If = ff8 2 . 

Hence G denotes the solution to the original problem (a), f the solution to problem (b) 
where stationary and nonstationary components are separated. Consequently the solutions 
to (b) have the form: 

"1 ] f / 

(16) 

where the corresponding SVD for the stationary subproblem of (b) and its limit can be written 

as 

fe^ 3 )T0 2 = (-z^y^iy^y^iy^z^f, 
E4(4)'T0 2 = E~z*(yK )'m^ 2 )')- 1 Ey t n 2 (^ 3 )'T. 

Solutions to these equation are not unique. In light of Lemma A. 3 the restrictions 
T' 32 S Pj22 = I = r 3 2 S'p i 22 will be imposed. Here £ p ,22 is a suitable selector matrix, i.e. a 
matrix whose columns are columns of an identity matrix. W.r.o.g. it can be assumed that 
S' p22 = [1,0] by using an appropriate transformation T z . Note that this implies that ©2 and 
2 are not necessarily diagonal. Let S p = [S Pt i, S Pj2 ] where S' p l = [1,0] and S' p2 = [0, S p22 \. 
Then r'S p = I = f ' S p are sufficient restrictions to identify the solutions T and T. Analogously 
G' 32 S Pt 22 = I,Gi^\ = I,R = diag(Ri, R 2 ) identify a solution (asymptotically, see Lemma A. 3 
and Lemma A. 5). These solutions will be used in the following. Here ©2 and ©2 resp. denote 
the (2, 2) blocks of © = diag(7, ©2) and = diag(J, ©2) respectively. 

The relations between the various solutions to the generalized eigenvalue problem are 
collected in section B. Throughout the rest of the proof we will use the following notation for 



blocks of matrices: For a matrix X partitioned into blocks we let X^j denote the blocks of 
the matrix. If multiple blocks are included also the notation 'i : f will be used indicating the 
matrix built of blocks with indices i up to (and including) j. In order to denote block rows or 
columns we use a semicolon for selecting the whole row or column. Hence e.g. G32 denotes 
the (3,2) block, the first block row and <5i : 2,i the first two blocks rows in the first block 
column of the matrix G. 

The next lemma establishes orders of convergence of the solutions to the generalized 
eigenvalue problems. 

Lemma A. 6 Let the assumptions of Theorem 3.2 hold. 

(I) Partition the matrices Q,M,^,^ according to the partitioning of z t denoting the var- 
ious blocks using subscripts. Then 

OpiT- 1 ' 2 ) 

.1/-*- Op{T~ 1 / 2 ) 

P {T~ l l 2 ) Op{T~ 1 ' 2 ) 



Jyz 



J yy 



r- 1/2 (^ 2 ,^i) T- xl \vWzl 2 ) 

P {T- V ) OpiT- 1 ) OpiT- 1 / 2 ) ' 
Op{T~ 1 / 2 ) OpiT- 1 / 2 ) J ' 

T-^^vlx) 



P {T- X ) OpiT- 1 ' 2 ) 
OpiT- 1 / 2 ) 



The termsO P {T- 1 ) are 0((logT)(log logT) 2 /T) andtheO P {T- x l 2 ) terms are 0((logT)(log logT) 2 /T -1 / 2 ). 
(II) Let J := Q — «3>. To simplify notation define Zij := T -1 ^^, z^}, i,j = 1,2. Then 



J, 



h3 



~ Z H Z ll S ly] Z ll Z lj + Z il Z U S j 



yz 



+ [5% - ZaZn 1 *"]^^)')- 1 ^ - S^n^A + opiT- 1 ), 
J 3 , = 5f y Z^Z u + (z^y^iy^yl,)- 1 ^ - 5 2 y 1 y Z u 1 Z ll ] + op^ 1 ), 

(17) 

for i = 1,2, j = 1,2 where expressions for the remaining blocks of J follow from sym- 
metry. Hence Jij = Op(T~ l ) and indeed Jij = 0((logT) 3 /T) for i,j = 1,2. Further 
J 3i = O p {T- 1 / 2 ) and indeed J 3:i = 0((logT) 3 /T~ 1/2 ) for i = 1,2. J 3i3 = O p {T- 1 ) and 
J 33 = 0((logT) 3 /T) respectively. 

(Ill) 8G:=G-T = Op{T~ 1 / 2 ) and moreover 5G = 0((logT) 3 /T 1 ' 2 ). 

Proof: (I) The orders of convergence for the various entries of 5 ZZ and 5 yz follow from 
Lemma A.l. Details are omitted. 



(II) Set Bt := D y (iit , Zt)D z and At := D y (yf ,yf)D y in Lemma A. 4 where the par- 
titioning refers to nonstationary and stationary components in the various matrices. Note 
that for the in probability part ar = br = T -1 / 2 and for the almost sure convergence 
a T = br = logT(loglogT) 2 /T -1 / 2 fulfill the assumptions of Lemma A. 4. Also note that 
5§2 = 0,622 = simplifying the expression for J 3) 3. Then Lemma A. 4 proves this part of 
the lemma. The orders of convergence for the various entries of J follow from the equations 
given and the orders of convergence provided in Lemma A.l. Here also the uniform bound 
on the two and infinity norm of {zf 2 > z^ 2 )~ l > {Vt,2->Vt,2)^ 1 which are implied by Assumption P 
are used. 

(III) follows from (I) and (II) in combination with Lemma A. 5. □ 

The next lemma (proven in section A. 4) gathers more detailed results on the asymptotic 
properties of the entries of SG: 

Lemma A. 7 Let the assumptions of Theorem 3.1 hold. Then we obtain 
SG 1A = 0, 

<5Gi, 2 = -Z n l Z 12 5G 2>2 + Z^ l [5f z UM ~ ^i,3f 3) 2](/ - e 2 )- 1 + (T-V2), 
SG 2;1 = (T- 1 ^,^))- 1 [J 2 . M + (J 2 . li3 - e- 3 )^ 3 ,i] +o{T- 1 ) = 0((logT) 7 /T), 
5G 2 , 2 = (T- 1 ^.!,^.!))- 1 [J 2 .i, 3 f 3 , 2 e 2 - 2 - +o(T-V 2 ), 
SG 3A = S[J3 t i — Szl] + o(T -1 ), 
^3,3^3,1 = (/ - <^ 3 , ^ 3 )f 3, 2 f J i2 )[J 3 ,i - 5f z ] + oiT- 1 ) = 0((logT) 6 /T), 
5G 3 , 2 = o{T' l l 2 ) 

where z^.i = *t,2 ~ Z^Z^z^Szz' 1 = oJ z - Z 21 Z^bJ z , J 2A;1 = J M - Z 21 Z^-J 2;1 , 

S = (Z 33 - (^a,^)^,^)- 1 ^,^))- 1 
using Z33 = (z^ 3 ,z^ 3 ) and 

P33 = -^33 — -Z33r 3)2 r 3i2 Z33. (18) 

Next these approximations are linked to the estimate $rrr^- 

Lemma A. 8 Let the assumptions of Theorem 3.2 hold. 
(I) Then 



T y {hRRR,--h r )T z x D- z x = [VT<e t ,^ r )^Jfft + 



TL 

Vto 2 



8G'(L - MGG ] ) 



+ 

where 





o o VT[^-hm-^fM^' T ^ T U 



+ o P (l) 



02,3 — (Vt,2i ^3)^3,2^,2 — ^2^3 2 -> 2 T' 3 2 — t>2,3 

denotes the solution to the subproblem of the problem (b) corresponding to the stationary 
components. v\ 2 := (r^E^^)'^)- 1 ^. 

(II) Letting 5G-^\ and 5G-^ 2 denote the first and second block column of 5G it holds that 

TSG'. tl (I - MG&) = [-T5H'Z 2 iZ u 1 T5H' TSG' 3jl P] + o(l), (19) 
VfSG'. t2 (I - MG&) = Vf5G' 2y2 [-Z 2 iZ u 1 I 0]+o(l) (20) 

where 5H = SG 2 ,i - 5G 2i2 (rl 2 )'Ez^ 3 (z^ 3 y5G 3tl The term o P {\) is also 0((logT) a ). 

Part (I) of the lemma splits the estimation error (3rrr^ — K (up to errors of higher 
order) into three terms: The first term asymptotically equals the estimation error in the 
situation that the row space of b r is known and used in the estimation to obtain unrestricted 
least squares estimators. The second term accounts for the effects of the rank restriction in 
the nonstationary components of both the output (i.e. yf^ as well as the regressors (i.e. 
z^,i = 1,2). Part (II) of the lemma provides a more detailed expression for this term. The 
last term corrects for the rank restriction in the stationary directions. 

Up to now only for showing \fT5G' 3 X (J — (zf 3 , z^ 3 )T 3,2^32) ~~ > the exact form of 5 ZZ 
and J are used. All other results up to now rely only on the order of convergence of these 
terms. The proof of Theorem 3.2 is completed with the last lemma of this section giving 
explicit expressions for the asymptotic distributions of the various parts of the expressions 
for Ty($RRR,r — br)T z ~rL>z~ : r given in Lemma A. 8. The result then follows directly from (15). 

Lemma A.9 With 6\ = (6' 2 (Ey^ 2 (y t n 2 )T ^"^(Ey"^" y)" 1 we have 

{eu~zli)Z^ A f(T y AW,W? :1 ), 
^/T{e u zl 3 ){~zl 3 rzl 3 )- 1 4 Z r ~N{ti,V), 

Vf5G 2 , 2 = (t- 1 (^ 2 . 1 ,^))- 1 (^2.i,^2)(02)' + op(i) 4m; 2 (oJ)', 

TP'SGs,, = Z^PVf(^ 3 ,e t>1 ) + Z3-3 1 Vf {?{~zl 3 ,y t>2 ) - PES t n 3 (y t n 2 )') 3' + o P (l) -> R', 
T5H = (T- 1 ^.!,^))- 1 [<^ 2 .i,e t) i) + <^2.i,^2)U-0 2 0S)'H'] +o p (1)An', 
VT[p 2 , 3 -p 2 , 3 ]P = 2 diVT(e t , 2 , z^iE-z^iz^y^P + opil), 



where P = (I - E^^T^r^). Further Y a = f W^W^)' '. #ere M r>2 = [0,/]M r witfi 
M r := f(T y AW,W^ 21 ) where W z>2 .i ■= W^-Y^Y^W^. N = [[/,0]+S(/-O 2 O|)[0,7]]M r . 

converge in distribution to Gaussian random variables with mean 

zero. 

Combining Lemma A. 8 and A. 9 we obtain that T y 0RRR >r — b^T^-D* 1 A- 

-NY 21 Y 11 1 N R 

-d 2 o\M r , 2 Y 2X Y^ o 2 6lM r>2 o 2 d\z r)2 p 

From this the claim follows using the block matrix inversion since T y (fioLS,r—b r yT~ x D^ 1 -4- 
[f(T y AW,W?),Z r ]. 

Using standard asymptotics for the term (et,zf)(zf, zf}' 1 we obtain the asymptotic dis- 
tribution of Prrr^u — Pols,u- stated in Theorem 3.2 from (15). Note in particular that 
R := Z r< iP — R where Z r ^ denotes the limit of \/T(et,i, z^ 3 )Z^ . This concludes the proof 
of (I) of Theorem 3.2. 

Following the arguments of the proof and using Lemma A.l (IV) it follows that the usual 
changes occur if a constant (and a deterministic trend respectively) is included in the regres- 
sion: The asymptotics in the stationary directions are unchanged. For the nonstationary di- 
rections the Brownian motions are replaced by their corresponding demeaned (and detrended 
respectively) versions. We omit details in this respect. 

A. 4 Proof of Lemma A. 7 

In the proof the following results are used: With S := — 'Ket^y' t2 {^y^ 2 {y^ 2 )')~ 1 , zt :2 .i = 
h,2 — Z 2 iZ ± iZ tj i, J 2 .\,i = J2,i — Z 2 iZ ± ^Ji : i, J 2 .i,3 = ^2,3 — Z 2 \Z X ^ and e t; i. 2 = £t,i + ^Vt^2 
it holds that 

Vr\J^-5^] = E^ n 2 )'~' + 0p (l), 

TJ2.1.1 = (^2.1,^,1.2) +0P(1), 

VTJ 2A>3 = (^ 2 . 1 ,^ 2 )(Ey t n 2 (y t n 2 ) / )- 1 2 r^ 2 ^33 + op(l). 

where Z 33 := E5 t n 3 (5 t n 3 )'. 

The first claim follows from 

^K, - O = (vt,2,%i - yh) = -<%V~m> -> 



f(T y Aw,w^)AZ r z 33 T 3 , 2 rl 2 



where yj 2 = i tj2 + ^2,3^,3 - (it,2 + &2,3^,3, ^)(zf, zf) 1 zf. Then the result follows straight- 
forwardly since z t>3 and e t: i are stationary and uncorrected by assumption. From this 
also the second and third claim follow immediately using the expressions for J^i derived 
in Lemma A. 7. The fourth claim also follows from these expressions noting that Ey^ 2 (z^ 3 )' = 

Now with respect to the blocks of 5G note that due to the chosen normalizations T^i = 
G 1;1 = I implying <5Gi,i = 0. 

Using the order of convergence for J, 5 ZZ (Lemma A. 6 (I) and (II)) and SG (Lemma A. 6 
(III)) the (1,2) block of (11) implies 

<5Gi, 2 + Z£Z 12 6G 2 ,2 = Z^[Sl 3 z G 3 , 2 R 2 2 - J 1:3 G 3:2 ](I - R 2 )' 1 + o{T~ l ). 

The expression given then follows from noting that 6™ = 0((logT) a /T 1 / 2 ), G 3t2 = f 3)2 + 

OftogTf/T 1 / 2 ), J lj3 = 0((logT) 3 /T 1 / 2 ) and R 2 = Q 2 + 0((log Tf/T 1 / 2 ). 

The expressions for 5G2,i and SG 2t2 follow from the second block row of equation (12) noting 

that 



(I m - *f tf ') 



00 

-Z 21 Z^ I 

I-^t^t'^s^)- 1 ^ 



T-\ll 2A ,zl 2A ) 
P33 



Also R\ — I = 0((logT) 7 /T) follows from the (1, 1) entry of (11). Then the (3, 1) entry of 
equation (11) implies that 

5G^ = S[J^-5f z ] + o(T- 1 ) 

where 5 _1 — > S 1-1 as 



which is ensured by the assumed nonsingularity of the covariance of [(yj^)', {z% 3 )']'- Further 
the (3, 1) entry in equation (12) directly implies the expression for P 3 , 3 5G 3t i using the orders 
of convergence derived above. Next 

h,i-5f z = ^S-^+^.^aX^,^)- 1 ^-^) = r- 1 /2 [( ^3, £ ~ t!l)+( ^3,^ 2) (s'+ (i))]. 



Note that (^ 3 ,e t) i) = 0(Q T ), Z' 1 (z^y^) = T 3t2 6 2 +0(Q T ), P-P = 0((log Tf/T 1 ' 2 )) 
and thus v/T^^i = 

= VTP(J 31 -6? z )+o(T- 1/2 ) = J P(^ 3 ,e t ,i)+ J P(^3,^ 2 )(H / + (l))+o(T- 1 / 2 ) = 0((\ogTf IT 1 ' 2 ). 

Here the last order follows from PEz^^z^yr^ = as is easy to verify. 

Since 5G = 0((logT) 3 ) /T 1 / 2 ) also <5Gi, 2 = 0((log Tf /T 1 ' 2 ) and <5G 2 , 2 = 0((log T^/T 1 / 2 ). 
For f ± such that Y' ± Y 3j2 = we have that t' ± Z^P 33 = f" ± . Then the (3,2) block of (12) 
shows that 

[0,T' ± ]8G : , 2 = T' ± SG 3 , 2 = f' ± Z^P 33 8G 3t2 = [0,f' ± Z^][JG : , 2 Rf - 5 ZZ G : , 2 ] = o(T- 1 ' 2 ) 

since the (3,2) block entry of JG and S ZZ G both are of order o(T _1 / 2 ) as follows from the 
norm bounds given for the blocks of J and 8 ZZ . 

Due to the chosen normalization T' 3 2 S Pt22 = G' 32 S P)22 = I and thus S' 22 8G 3t2 = 0. 
Since [r^,S , Pj22 ] is nonsingular (as is straightforward to see from r' 3 2 5 Pj22 = /) we obtain 
8G 3 , 2 = o(T -1 / 2 ). 

The order of convergence for <5G 2; i follows from the orders of convergence of J, S zz and 5G 
as derived in Lemma A. 6. 

A. 5 Proof of Lemma A. 8 

(I) Let P RRR , r = TyfiRRR/T-} and b r = T y b r T~}. Then 

Prrr,t = (ffi, %)G& = b r (z?,z?)D z G&D' z + (i t , ~zl)b z G&b z 
= b r (z?, z?)D z G&D' z + [VT(e t , zf)D z ] WD Z 

where = stands for equality up to terms of order o(T _1 ) in the first c z columns and of order 
o(r -1 / 2 ) in the remaining columns. This follows from (e t ,zf)D z = 0((log Tf)T~ 1 / 2 ), 5G = 
0((logT) 3 T- 1 / 2 ) and the definition of D z = diag(T -1 /, T~ X I 2 1) showing that G can be 
replaced by T in the second term in the last equation with introduction of an error of the 
stated order. Now since b r is block diagonal 

Next note that (using the index ':' to denote block columns or rows resp.) ?v,i,: = [I, 0] = 
f' :1 . Therefore we have recalling that (G f )' = G(G' MG)~ l 

[1, 0] = G' :1 M(G^y = (f :jl )'M(G t ) / + SG'. tl M[&)' 
= b rX .M{&)' + 5G' :<1 M(&)' . 



This leads to b r ^.M{G^)' = [[1,0] - SG'.^M {G^)'\. Hence we obtain 

b rA ,(zf, z?)D z G&D z = TV2 [[/, o] - 6G[ tl M(&)'] G'D Z 

= W,!,: + T [5G'. tl ] [I - MGG+] 

Here G' :>1 = [I, 0] + 8G' :1 is used in the last line. 

With respect to the second block row it follows from 6 r ,2,: = [0,62,3] that 6 r ,2,: = b r ,2,-.D z . 
Also we have from 2 ,3 = 2 Y' 32 using G'. 2 MGG^ = G'. 2 by the definition of & that 



P 2 , 3 (zZ 3 ,z?)D z G&D z 



2 T' 3t2 (zl 3 ,zJ)D z G&D z = 2 T[ 2 MG&D z 

6 2 [G[ 2 D Z - 8G[ 2 MG&D Z ] = 2 [t' 2 + 5G', 2 (I - MG&)D Z ] 

%fc,z]+6 2 5a 2 {I - MG&)D g . 



This implies 



br2,..(z?,z?)D z G&D g = b r , 2 ,D z (z^Zt)D z G&D z 

= [62,3 " £2,3] <^ 3 > %)DzG&D g + ^2,3(^3, %)D g G&D g 

= [0, [62,3 - ^^(^ I^I^] + ^3^3, Z?)D g G&D, 



= [0, [62,3 - ^ 3 ]e4M^ 3)^3,2^,2] + [0, &, 3 ] + 2 *G( )2 



/ - MGG^ 



D z 



W,2, + [0, [62,3 - 02,3] Ezn(^ n 3)'r 3 ,2r5 i 2 - / 



+ e> 2 v / T<5G ! ' o I-MGG ] 



Here the third line follows from the orders of convergence in 5G = G — f established 
above and 62,3 — 02,3 = °(T~ e ),£ > as follows from standard theory in the stationary 
case. Then the representation given in (I) is proved by replacing C> 2 by its limit 2 which 
introduces an error of the required form since 2 — 2 = o(T~ € ) for some e > as follows 
from 02,3 — 62,3 = o(T~ e ) (see the proof of Lemma A. 3). 

(II) For (20) note that 5G and M - * both are of order 0((log TfT' 1 / 2 ) and the two 
norm of ^ and T is of order O(logT). Therefore replacing MG& by 'iTft introduces an 
error of order 0((log T) 4 T -1 / 2 ) = o(l) proving (20) since «5G 3 ,2 = o{T- 1 / 2 ) (see Lemma A.7) 
and 

VT6G' :j2 (I - *fft) = Vt5G 2 ,2[-Z 2 iZ^ I 0] + \/T<5G 3i2 [0 I - z 33 v 3 , 2 r J j2 ]. 
With respect to (19) note that 



TSG'.i 



I - MGG ] 



TSG'.i 



i - *rr f 



+ TSG'. 1 



where 

T6G'. A \l- trft] = [-T5G' 2>l Z 21 Z^ T5G' 2>1 T5G' 3A P] (21) 

is obvious from the form of I — ^IT' (see the proof of Lemma A. 7). 
Noting that MG& = MG(G' MG)~ l G' it follows that 

SG'.^W - MGG^] = < 5G; il [(/-*ff t )(^-M)ff t -^f(f'^f)- 1 5G / (/-'J'rf t ) 

-(I - <MT f )*<5Gf t] + o{T- 1 ) 
= -5G' 3A (2Z 3 ,zZ 3 )T 3 , 2 (T>^ 

since <5Gi,i = o^ 1 ), 5G 2 ,i = 0((logT) 7 /T) and (see the proof of Lemma A.7 and (17)) 

>/f5G , 3il (J-Z33r 3) 2fJ 2 ) = VT[J 3 ^-5fJS(I-Z 33 t 3 ^ 32 ) + o{T^) 

= m, 3 ,e t ,i) + (zl 3 ,y t , 2 )E']'S(I - Z 33 T 3 , 2 f J 2 ) + o(T" e ) 
= o{T^) 

for some e > due to (z£ 3 ,£t,i) = o(T~ e ) for < e < 1/2 and {zf 3 ,y t;2 ) ->• Kz^ 3 (y^ 2 )' = 
^^3(^3)^3,2^2- N ° w {yhrzt,z)S{I - Z 33 T 3 , 2 T\ 2 ) -> according to 

5Z 33 f 3,2 ->• 5E5 t n 3 (z t n 3 )'r 3i2 . (22) 

Recall that by definition 

E^yr^el = E5 t n 3 (^)'(E^(^)')- 1 Ey t n 2 (z t n 3 )'r 3 , 2 . 

Together with the definition of 5 = (E^")' - E5 t n 3 (y t n 2 )'(Ey t n 2 (y t n 2 )0- 1 Ey t n 2 (5 t n 3 )')- 1 
this implies 

5E5 t n 3 (z t n 3 )'r 3 , 2 = r 3 , 2 (/-e2)- 1 . 

Finally >/T[<5^ - ^](^ 2 ,^ 2 ) _1 = H + o(T" e ) is used (as derived above). 
Since 5G 3)2 = o(T~ 1/2 ~ e ) (see Lemma A.7) it follows from the form of {I - #f f t) (see 
the proof of Lemma A.7) that 

WG!,i[ffft - MGGt] = -(VrtfGa.iJ'E^^nj'Cr^J'CVr^Jt-ZaxZil 1 , J,0] +o(l). 

(23) 

Combining (21) and (23) we obtain 

T&G'. i \l - MGG^J = T (<5G 2il - ^G 31 Ez t n 3 (z t n 3 ) / (r| )2 ) / 5G 2 2 y \—Zi\Z^, I, 0] + [0 TJG 3)1 P] +o(l). 
This completes the proof by the definition of 5H. 



A. 6 Proof of Lemma A. 9 



The first claim is standard and follows from Lemma A.l using rit := [z' t 1 , (z^)']', vt := 7^Aej 
noting that then vt is a martingale difference. The second claim is a standard central limit 
result. Further from Lemma A. 7 we have 

Now from the proof of Lemma A. 7 

^2.1,3^,2 = (^2A,^2)(Ey t n 2 (^ n 2) , )" 1 02r^ 2 Z33r 3 ,2 + Op(l) 
= <^ 2 .1,^2>(E^ 2 ^ 

+(^ 2 .i,^3)r 3)2 ei + p(i) 

since 6 2 = O^Ey^y^y^O^r'g^Zssrs^) as is straightforward to show. This shows the 
expression for \/TbG2,2 since VTdfz' 3 = (££ 2 .i> *t,3)- 

In order to obtain the expression for TP'SG^,^ note that from the definition of P, P33 and the 
arguments in the proof of Lemma A. 7 

<^ 3 , z^P'SGs,, = P 33 «5G 3;1 = P[J 3)1 - 5f z \ + o(T- 1 ). 

Now VT[Js,i — 5^1] —7- EzJ I 3 (z^3) / r3 i2 2 H / according to the proof of Lemma A. 7 and 
P — > I — Kz^^z^yV^^s 2 wnere the difference is of order Op(T~ 1//2 ) since only stationary 
components are involved. The result then follows from the expression for J3 1 according to 
Lemma A. 6 (II) since 

(/ - E5 t n 3(z t n 3 ) / r 3 , 2 r 3i2 )Ez t n 3 (5 t n 3 ) / r 3i2 o 2 s' = o. 

The evaluations for 5H are more involved: Using the expressions given in Lemma A. 7 and 
defining Z 22 .i = (^,2.1,^,2.1) we have 5H = 5G 2 ,i - ^^(r^'Ez^^)' '<5G 3 ,i = 



— ^22.1 

— 7- 1 
~ ^22.1 

— ^22.1 



J 2 .i,i + { J 2 .i,3 - Szz 1 ' 3 - (J 2 .i,3r3, 2 e 2 2 - & 1,3 r 3i2 )r+} <sg 3) i] + o^- 1 ) 

'^2.1^3,2(62 2 - /) + (J 2 .i l3 - & 1,3 )r 3)2 ) r + } sg 3>1 ] + o(t- 1 ) 



J' 



2.1,1 



^2.1,3 — Ozz 



72.1,1 + (<^2.1,: 



£■2.1,3 



) (/ - r 3 , 2 r+) «5G 3) i - J2.i,3r 3 ,2(e 2 - 2 - i){v\ i2 )'^{^)'sg^ + o^- 1 



= z 2 ~ 2 \ (J 2 .i,i - J2.i,3/v / Tr 3 ,2e 2 - 2 o 2 H') +o P (T- 1 ) 



(using r + := {r\ 2 ) / E5^ 3 (z^ [ 3 ) / ) where the last line follows from T( 7 2 .i ;3 — Sl'z^) [I — r 3)2 (r3 2 ) / Z 3 3j <5G 3; i 
in probability since VT[J 2 .i. 3 — bzz \ converges in distribution, VfSG 31 -> S , Z3 3 r 3i2 2 ~ / 



and SZ^Y^2 = ^3,2(1 — ©I)" 1 ( see (22))- The result then follows from some algebraic 
operations. 

The final statement applies Lemma A. 3: 

r' 3 , 2 - r 3 , 2 = o|((y- 2 , ^ 3 )((^ 3 , zi,))- 1 - M(i - s p , 22 ry + {t-W). (24) 

This shows the result since /3 2]3 — ^2,3 = (02 — C*2)r 32 + 02(r 32 — T 32 ) and r' 32 (/ — 
^33r 3 , 2 r| i2 ) -> 0. The fact that (/ - S^I^X* - Z 33 r 3i2 r^ 2 ) = (/ - Z 33 r 3i2 r| 2 ) simplifies 
the expressions. This concludes the proof. 

A.7 Proof of Theorem 3.3 

The proof follows closely the proof in the OLS case. The changes in comparison to the OLS 
case are that (yt, zt) is replaced with 

£ y ,z ■= (yt, 4) - A^A^ - Clu Az n^l Az ((Az t , zf) - A A z,A^) = (Vt, z t) + B + 

where hence the additional term is called B + and in the SVD the weighting is not based on 
(y?,y?) but on W + = ((y?,y?) + (7+)^ where 

The asymptotics for the additional terms are detailed in the lemma below: 
Lemma A. 10 Under the assumptions of Theorem 3.3 the following holds: 

Au A ^+nu Az ti A l Az ((Az t rz?)-A AzAz ,) = [Q^n^Azr'fdBnB^ + opil) opiT- 1 / 2 )], 

(luA^AzAz^^Vt) - A AzAr ) = [ ^llX^)- 1 JdB n B' n ( J ) + op(l) o P (T-^) 

Proof: The proof of the first statement uses the fact that according to Lemma A. 2 A^Az = 
[op(l), op(T _1 / 2 )]. Here the restrictions on the increase of K as a function of T is used 
such that we obtain \jKjT = o p (1),1/VKT = c^T" 1 / 2 ), K~ 2 = o P {T- 1 / 2 ). Hence it is 
of lower order compared to the leading terms. Further f^Az^Az Az ((Az t , z t ) — Aaz.Az) = 
[Op(l), op(T" x / 2 )] and hence these terms are of the same order as the leading terms. The 
proof of the second statement is an easy consequence of the results listed in Lemma A. 2 using 
yj = b r zj + ej. Note that 

(Azt,yt) ~ AazAv* = (.(Az t ,zt) - A AzAz n)b' r + ((Az t ,u t ) - A Az ,a«-) 



Convergence for the first summand is contained as the first statement in the lemma, while 
again following Lemma A. 2 we obtain convergence for the second term. □ 



For both E yj2 and W + after transformation using the matrices T y , T z the additional terms 
in the diagonal blocks are of lower order than the original terms. For the off-diagonal blocks 
the additional terms are of the same order in probability. This follows from the results in 
Lemma A. 2. However, for the off-diagonal terms in the consistency proof only the order of 
convergence is used. Consequently the consistency result and the order of convergence (in 
probability) also hold in the FM case. 

In the following we will use the following definitions using the same notation as in the 
OLS case in order to avoid the introduction of new symbols. 



Q 



y,z 



M := (D z z?,D z %), 




^ 1 {^t,l^t,2) 










(Zt,3iyt,2)^y2,y2(yt,2i *t,3,) 



* : = 



Here 












£ y 2,y2 := (Vt,2,Vt?)-M \^^,Az({ A ^ Vt) ~ A A,,Ay) " ((A^jf?) - A AZ)A ^(^ >A ^^ A J 



such that T, y2 ,y2 ->■ S y2 ,y2 := Eyf 2 (y 



r,n r~n y 



Hence the definition of Q is adapted to the SVD occurring in the FM estimation. Also 
the (3, 3) entry of <J> is changed slightly. The reason for this is visible in the next Lemma A. 11 
which is the analogon to Lemma A. 6 for the OLS case. 

Lemma A. 11 Let the assumptions of Theorem 3.3 hold. 

(I) Partition the matrices Q,M,<&,i& according to the partitioning of zt denoting the var- 
ious blocks using subscripts. Then: 

OpiT- 1 ' 2 ) 



5 ZZ := M-tf = 



P {T^/ 2 ) 

OpiT- 1 / 2 ) P {T~ 1 ' 2 ) 



Jyz 



'H(y 

T 



T- 1/2 (yl 2 ,zZ 2 ) 



+ D V B+D Z 



P (T- r ) P {T' V ) OpiT- 1 / 2 ) 
OpiT- 1 / 2 ) OpiT- 1 / 2 ) o P (l) 



*yy •■= D y ((y?,yJ) + C + - 



( z t,i' z t,i 











Op(T~ l ) OpiT- 1 / 2 ) 
Op^- 1 / 2 ) 



(II) Let J := Q — To simplify notation define Zij := T (z^, zjj), = 1'^- Then 



J, 



h3 



+ [5% - Z^Z- 1 Sl 2 y ]^ 2 [S 2 y i - 5f y Z- 1 Z l] ] + op^ 1 ), 



(25) 



J: 



3,3 



^.fir^^-^J ^'vlyM^h)-^ +op(T- 1 ) 



for i = 1,2, j = 1,2 where expressions for the remaining blocks of J follow from symmetry. 
Hence J itj = OpiT' 1 ) for i,j = 1,2. Further J 3ji = OpiT- 1 / 2 ) for i = 1,2,3. J 3j3 = 
Op^- 1 ) and J 3j3 = 0((log T) 3 /T) respectively. 
(Ill) 5G = OpiT- 1 ' 2 ). 

Proof: (I) follows from Lemma A. 10. Note that compared to the OLS case in the (3,3) 
entry of $ the matrix (yf 2 ,yf 2 ) ls replaced with ^ y 2,y2 = l \Vt2iVt2) + OpiT -1 / 2 ) in order to 
obtain S 22 = rather than op(T~ 1 / 2 ). Only the in probability statements are used. 

The proof of (II) then is unchanged except that 5yf = op(T -1 / 2 ) needs to be taken into 
account. (Ill) is then immediate. □ 

Next the proof of Lemma A. 7 uses only the results of Lemma A. 6 and equation (12) in 
combination with the following limit results: 

v^C-O^,^)- 1 = H + o(l), 

Vf[J 3l i - & 1 ] = E^ n 3 (y t n 2 ) / H' + op(l), 
T(zl 2A ,~zl 2 . l )- 1 J 2 A,i = {T-^zl^-zl^))- 1 ^{zl^et^ 

^T{~Zt,2.1,~zl2.ir 1 J2.1,3 = (^2.1, ^.l)" 1 ((^2.1,^2+) + ^2.1,2) / S^ 2 2 r 3i2 Z 33 + p(l). 

where again E := — Ee ti iy^ 2 S~ 2 1 y2 and it,i.2 = £t,i + ^2/^2- Further (B + denoting the trans- 
formed quantity B + ) 



£1,2.1 = [I,0]B + [-Z2iZ u 1 ,I,0]', B 2 , 2 .i = [0,I]B + [-Z 21 Z u 1 ,I,0]'. 



Here the first statement follows from Lemma A. 10. The second from the fact that the 
(1,3) block of Yiy^ is of order op(T -1 / 2 ) relating to a stationary component of the regressors. 
The remaining statements follow straightforwardly from the definition of J. 

Lemma A. 8 needs to be changed slightly by replacing a.s. statements by the corresponding 
in probability version. 

Lemma A. 12 Let the assumptions of Theorem 3.3 hold. 
(I) Then 



1 n-1 



RRR,r 



~ b r )T-'D z 



= Vt 



+ 







b z t f t + 



TI 

o Vto 



SG'(I -MG&) 



. o o VT[h,-hm-^u^)' T ^2\ J +op(1) 



where 



= (^2,^3)r 3) 2fS )2 = o 2 f' 3j2 o 2 r 3j2 = 6 2j3 

denotes the solution to the subproblem of the problem (b) corresponding to the stationary 
components. t\ 2 := (P^zfe, ^ 3 )f 3i2 )- 1 f ' 32 . 

(II) Letting 5G- t i and 5G l:2 denote the first and second block column of 5G it holds that 

TSG' :1 (I - MGG f ) = [-T5H'Z 2 iZ u 1 T5H' TSG'^P] + o P (l), (26) 
VT5G'. 2 (I - MGG^) = VT5G' 2)2 [-Z 2 iZ n 1 I 0]+o P (l) (27) 

where 5H = SG 2 ,i - 5G 2y2 {Tl 2 )'Wz^ 3 (z^ 3 )' 5G^ and P = I - Z^r^f J )2 . 

The only change in the proof consists in exchanging the estimation error for the OLS esti- 
mator by the estimation error for the FM estimator in (I) . The rest of the proof is analogously 
to the OLS case and hence omitted. Primarily the orders of convergence derived above are 
used. 

It remains to analyze the asymptotic distribution of the various terms. This is done in 
the analogon to Lemma A. 9: 

Lemma A.13 With 6\ = (0 2 (Ey t n 2 (y t n 2 n -1 ^)"^^^)') -1 we have 

d 



{(e t ,% tl ) + B+)Z£ 4 f(T y AB,W*,0), 
VT(i t ,zl 3 ) A A/"(0,n 



VT8G 2 , 2 = Z 2 - 2 1 1 ((^ 2 ,^ 2 . 1 ) + B 2i2 J'(Ot)' + 0p (i)4M^ i+ (Ot)' J 



TP'5G 3A = ^3 1 P / ^(^3, e l il . 2 ) + Z 3 -3 1 ^(P-P)'Ezn(% n 2 )'H / + op(l)^ J R / , 



T6H 



J 22.1 



(^.i,e i ,i) + 5Ui+ ((^2.1,^2) + ^2,2.1) {I-0 2 d\)'E' 



+ o P (l) 



VT[p 2i3 -p 2 ,3]P = d 2 dlVT(e t , 2 ,z^)(Ez^(z^y)- 1 P + o P (l), 

where P = (7 - /r^r^). Here M 2 , + = /([O, 7]T y 5, W^ 2 - Y^Y^W^O) and 

N + = /([[/, 0] + E(J - O 2 O^)[0, /]]r y 5, W 2 n 2 - Y 21 Y^ 1 W^ 1 ,0). TP8G 3A and y/T^s - P 2 ]P 
converge in distribution to Gaussian random variables with mean zero. 



The proof follows analogously to the OLS case. The remaining steps of the proof are 
analogous to the proof for Theorem 3.2 and hence omitted. 



B Collection of notation 



In this section the notation is presented in order to make reference easier. The general concept 
is to use lower case letters for processes (where y is reserved for the dependent variable, z 
denotes regressors and v,w,u is reserved for stationary processes, e,rj denote white noise). 
Processes built using a number of coordinates of other processes are indicated using sub- or 
superscripts. Regression residuals are indicated using a superscript n (where the regressors 
are clear from the context) and their corresponding limits with a superscript II (this notation 
is only used, if limits exist). 

Upper case letters are used for matrices. Matrices that transform the basis of processes 
are indicated using T where the transformed process is indicated as a subscript. Scaling 
matrices that are introduced in order to ensure the convergence of matrices are denoted using 
D with subscripts denoting the processes to which they are applied. 

Estimates in the original basis are denoted using a •, in the transformed basis (see Theo- 
rem 3.1) with a • and in the transformed basis with appropriate scaling ensuring convergence 
with a • or • respectively. 

B.l Processes 

Below at,bt are used to denote arbitrary processes, where the notation applies to a number 
of different processes. 



Vt 

zt 

(at,b t ) 



a+ = 



zt 



b r z[ + b u z't + Le t , 



z t 

~u 
z t 

Vt 



, diag(A, I)H' r z r t = v t , diag(A, I)H' u z^ = w t , 
= c(z)e t ,c(0) = 0,detc(l) / 0, 



t=i 

a t - (a*, zj*) -1 ^ (-> a t n (if convergent)) , 



Vt = T y (yt - b u zf) = b r z t + e t = 



10 

6 2 , 3 J 



Zt,l 
Zt,2 
Zt,3 



+ 



Tz,r z t 



Zt,l 
Zt,2 
Zt,3 



c Z! i(z)e t ,Az t ,2 = c Zt2 (z)e t ,zt,3 = c z ^(z)e t stationary 



£ t,i 



T 7 U 



zu 

t,l 

7,11 

t,2 



A5 "i = c M) i(z)e t ,z" 2 = c Ui2 (»£ t stationary, 
P..i(l) 



c 2 , 2 (l) 



is of full row rank. 



B.2 Matrices 

b r G K sxm -r coefficient matrix corresponding to z\ 

b u G R sxm " coefficient matrix corresponding to 

6 = [6 r ,6 u ] coefficient matrix corresponding to zt- 

T y £l sxs used to transform y t into y t separating stationary from nonstationary terms. 

T z ,r G M m "- Xm "- used to transform z\ into z t separating stationary from nonstationary terms. 

T z ,u G M m « xm « used to transform zf into 5" separating stationary from nonstationary terms. 

T z = diag(T z ,r,T z , u )- 

bu = T y b u T zu l . 

Pols OLS estimator of f3. 
Pols,t OLS estimator of b r 
$OLS,u OLS estimator of b u 

Pols OLS estimator of b. 
Pols,t OLS estimator of b r 
Pols,u OLS estimator of b u 

Prrr RRR estimator of b. 
Prrr,t RRR estimator of b r 
Prrr,u RRR estimator of b u 



0RRR RRR estimator of b. 

$RRR,r RRR estimator of b r 

(3rrr,u RRR estimator of b u 

E + eR sxs weighting matrix. 

3 = -Eet.i^OEj/tVt.a) -1 

M r = f(W,W^) where W denotes the Brownian motion corresponding to (et)tew, W z = 
ci: 2 (l)W,W u = c uA (l)W,W^ = W z - JW z W^JW u Wi)- l W u . Further f{W 1 ,W 2 ) = 
fdW 1 W 2 (JW 2 W£)- 1 . 

M u =f(W,W u ). 

N r =JW z Wi(JW u W( l )- 1 . 

M r , 2 = f{[0,T\T y W,Wn 2 -Y 21 Yu 1 W% 1 ). 

p =i- E5 t n 3(5 n )'r 3l 24 )2 ,rt )2 = (r' 3!2 E5 t n 3(5 t n 3)'r3, 2 )- 1 r 3)2 . 

^3,3 = (^3.^3) " (^3.^3)^3,2^2(^3,^3). 

D y = diag(T _1 / Cj/ , T -1 / 2 ) proper scaling for yf. 
D zr = diag(T _1 / Cr , T -1 / 2 ) proper scaling for zf. 
D Z)U = diag(T _1 / Cu , T -1 / 2 ) proper scaling for zf. 

D z = diag{D Z)r ,D z ^ u ) 

D z = D Z T X I 2 = diag(T- 1 /2/ ) /), f) y = DyT 1 / 2 = diag(T- 1 /2/ ) /) 
G = D- Z X G 

Q = (D z z?,D y y?)(D y y?,D y y?)-^D y y?,D z z?) 

M = (D z z?,D z z?) 

' T- 1 ^!,^) T- l {~z^,z^ 2 ) 

# = T-^zl,) T-^z^Cz^zlir^z^zl,) 

(^WfiKvt^yt^-Hvh,^) . 



Original Formula 




Transformed Formula 
Relations 


~4)G = (z?, ~z?)GR 2 
G — 1~ z 


Scaled Formula 
Restrictions 
Relations 


QG = MGR 2 
[1, 0]G : ,i = /, S' Pi2 G..,2 = I,R 2 = diag(i?f , Rl) 
G = f)- l G 


Decoupled Formula 
Restrictions 

Relations 


$r 

f'S p = I, (■ 

f = 


= *re 2 

5 2 = diag(, 
" / 


r 3 , 2 




Stationary subproblem Formula 

Restrictions 
Relations 


^3,2^,22 = I 

converges to & 2 ,3 = C^T^ 2 . 



Table 1: Singular value decompositions used in the article. 



q> = 



T-^z^zl,) T-^z^zh) 

<^ 3 >^3> 



J =Q-$. 



5, 



2/2 



M - * 



T- 1/2 (yZ2,zf,i) T-^iy^z^) 

T-^iy-y-) 



5 = ((^ 3 »^3> - (z^y^iy^yhr'iyt^zls))- 1 . 



r t = (r'^r)-^ 



«5G = G — f 

0| = (0 2 (Ey t n 2 ^ 2 )- 1 2 )- 1 2 (Ey t n 2 ^ i2 )- 1 . 



B.3 Singular value decompositions 



